## EAE - Introduction to Programming Languages for Data 
## Day 4 - 30/11/2023

### Instructor:  
Enric Domingo  
*Machine Learning and Software Engineer at ERNI*  
edomingod@professional.eae.es

#### Python Libraries for Data:

- Finish prev. session
0. Recap
1. Intro to Libraries
2. Creating and using our first module
3. The Python Standard Library
4. The pip package installer and using external libraries
5. Exercisises

---
## 0. Recap

In the previous session we saw: 
1. Dictionaries
2. Tuples
3. Sets
4. File I/O
5. Intro to Classes and Object Oriented Programming

---
## 1. Intro to Python Libraries

Python libraries are reusable chunks of code that perform specific tasks or a set of related tasks. These libraries save developers from having to write code from scratch for every single function or feature they want to implement. They are essentially modules or packages that are designed to be used in other programs.

Python libraries can be built-in, like the Python Standard Library that comes with the Python installation, or they can be third-party libraries that are developed and maintained by independent developers or organizations. These libraries can be used for a wide range of applications, including web development, data science, machine learning, automation, and many more. Examples of popular Python libraries that we will se in future days include:

- **NumPy** for numerical computations 
  
- **Pandas** for data manipulation
  
- **Matplotlib** for data visualization

Some other popular ones are Tensorflow and Pytorch to create Deep Learning models, Scikit-learn for Machine Learning, Streamlit for basic data web apps, etc.

To use a library in a Python program, you simply need to import it using the import statement. Once imported, you can use any functions, methods, and types provided by the library in your code.

The concept of library is a general used term for Python modules, packages, and frameworks. In this tutorial, we’ll use the term library to refer to any of these: 

- A **module** is a single Python file that can be imported. 

- A **package** is a directory of Python modules. 

- A **framework** is a collection of packages.

---
## 2. Creating and using our first module

The best way to understand what is a library or a module in Python is to break the black box, create our own module and use it in our code. 

To do it, we will need to create a new Python file in that can be in the same directory (folder) as our main program or in a different one. Then we will develop there some functions, classes, variables or anything useful and reusable that we will use in our main program.

After that, we will need to import the module in our main program and use it.

Let's see an example:


In [1]:
# Hello module example (hello_module.py)

# def hello()
    # print("Hello from the module!")

In [2]:
from hello_module import hello #para esto tenes que abrir el hello_module.py en otro tab

hello()

Hello from the module


In [3]:
from hello_module import hello, add

add (2, 3)

5

In [4]:
from hello_module import *

add(3, 4)

#adds everything from hello_module and below

7

In [5]:
import hello_module

In [6]:
# A more useful example:

# 1. In this same folder, create a new file and call it my_module.py

# 2. In that file, define a function named multiply_in_range(nums_list, start, stop) that takes a list of numbers
# and returns de product (multipliction) of all the numbers between start (inclusive) and stop (not inclusive).
# (you can try to develop it first here in the following cell and when it is working, copy and paste it to that file)

# 3. Then, save that file and in the next code cell, import it as 

In [7]:
#esto va al my_module.py

nums_list = [1, 2, 4, 6, 8]
start = 2
stop = 7

# result = 2*4*6 = 48

def multiply_in_range(nums_list, start, stop):
    final_prod = None

    for num in nums_list:
        if start <= num < stop:
            if not final_prod:
                final_prod = 1

            final_prod *= num
            print(final_prod)

    return final_prod


multiply_in_range([1, 2, 4, 6, 8], 2, 7)

2
8
48


48

In [8]:
import my_module

# 3 ways of calling the function
my_module.multiply_in_range(nums_list=nums_list, start=start, stop=stop) # one way to call the function
# my_module.multiply_in_range(nums_list, start, stop)
# my_module.multiply_in_range([1,2,4,6,8], 2, 7)


2
8
48
2
8
48


48

In [9]:
import my_module

my_module.multiply_in_range([1,2,4,6,8], 2, 7)

2
8
48


48

In [10]:
# Try to develop here the multiply_in_range function

def multiply_in_range(nums_list, start, stop):
    final_prod = None

    for num in nums_list:
        if start <= num < stop:
            if not final_prod:
                final_prod = 1

            final_prod *= num
            print(final_prod)

    return final_prod


multiply_in_range([14, 5, 20, 8, 16], 4, 15)
            


14
70
560


560

In [11]:
import my_module

my_module.multiply_in_range([1,2,3,4,5], 2, 5)  # this example should return 24

2
6
24


24

In [12]:
from my_module import multiply_in_range

my_module.multiply_in_range([14, 5, 20, 8, 16], 4, 15)  # this example should return 560

14
70
560


560

---
## 3. The Python Standard Library

The Python Standard Library is a collection of modules and packages that are intended to be used with the Python programming language. It is distributed with the Python installation, so you don’t need to install any external packages to use it.

You can check all list of available modules in the Python Standard Library in the official documentation: https://docs.python.org/3/library/index.html

Let's check some of the most useful ones:

- **os**: This module provides a portable way of using operating system dependent functionality. It allows you to access the underlying operating system’s functionality without having to deal with the differences between operating systems. It also provides some useful functions to manipulate files and directories.

- **math**: This module provides access to the mathematical functions defined by the C standard.

- **random**: This module implements pseudo-random number generators for various distributions.

- **time**: This module provides various time-related functions.


In [13]:
# os module

import os

os.getcwd()     # get current working directory

'/Users/nicolasraimundez/Documents/Code - EAE Business School/eae_ipld'

In [23]:
print(os.listdir())         # list all files in current directory
print()
print(os.mkdir("test"))     # create a new directory
print()
print(os.listdir())         # let's see if the folder was created...


['spotify-2023.csv', 'my_module.py', 'hello.py', '.DS_Store', 'house_data.csv', 'weatherAUS.csv', '__pycache__', 'Excercise1day4.py', 'ipld_day07.ipynb', 'stocks_prices.txt', 'cat_img.jpg', 'expensive_houses.csv', 'ipld_day05.ipynb', 'hello_module.py', 'ipld_day01.ipynb', 'ipld_day03.ipynb', 'inverted_image.jpg', 'expensive_houses.xlsx', 'ipld_day04.ipynb', 'ipld_day06.ipynb', 'ipld_day02.ipynb', 'sales.csv']

None

['spotify-2023.csv', 'my_module.py', 'hello.py', '.DS_Store', 'test', 'house_data.csv', 'weatherAUS.csv', '__pycache__', 'Excercise1day4.py', 'ipld_day07.ipynb', 'stocks_prices.txt', 'cat_img.jpg', 'expensive_houses.csv', 'ipld_day05.ipynb', 'hello_module.py', 'ipld_day01.ipynb', 'ipld_day03.ipynb', 'inverted_image.jpg', 'expensive_houses.xlsx', 'ipld_day04.ipynb', 'ipld_day06.ipynb', 'ipld_day02.ipynb', 'sales.csv']


In [24]:
# lists the downloads folder
os.listdir("c:/Users/Nico/Downloads")

FileNotFoundError: [Errno 2] No such file or directory: 'c:/Users/Nico/Downloads'

In [1]:
print(os.name)        # get the name of the operating system dependent module imported

NameError: name 'os' is not defined

In [2]:
# math module

import math

print(math.pi)        # get the value of pi

3.141592653589793


In [3]:
print(f"my res: {math.pi:.4f}") # get value of pi with 4 decimals only

my res: 3.1416


In [57]:
p = math.pi
p.__round__(4)      # another way to get value of pi with 4 decimals only

3.1416

In [5]:
print(math.log(20))         # get the natural logarithm of 20

2.995732273553991


In [6]:
print(math.sin(math.pi/2))      # get the sine of pi/2

1.0


In [7]:
# random module

import random

print(random.random())      # get a random number between 0 and 1

0.9863071914765612


In [8]:
random.random() * 20

13.610410490134301

In [9]:
int(random.random() * 20)

0

In [10]:
print(random.randint(1, 10))    # get a random integer between 1 and 10

8


In [11]:
my_list = ["apple", "banana", "cherry", "pear", "melon"]

print(random.choice(my_list))   # get a random element from a list

melon


In [12]:
print(random.uniform(1, 10))       # get a random number between 1 and 10

2.9903164762889896


In [13]:
# Let's create a simulated random list of temperature reads between 10 and 20 degrees

sim_temps = []

for i in range(10):
    sim_temps.append(random.uniform(10, 20))

In [14]:
# time module

import time

print(time.time())      # get the current time in seconds since 1/1/1970
                        # every system uses this date to align time

1702656812.7986848


In [15]:
# Let's measure the time it takes to run a function
# This is used a lot in Data Science to understand how much every code takes
# Ex: this code takes 4 hours, lets try to make it faster, or more efficient

start = time.time()

nums = []
for i in range(1000):
    nums.append(i)
    for num in nums:
        num = num ** num

end = time.time()

print(end - start, "seconds")      # get the time it took to run the code above

2.2774620056152344 seconds


In [16]:
# its to measure time from one line to another one for example

delay_s = 4

print("Print immediately.")

time.sleep(delay_s)
print(f"Print after {delay_s} seconds.")

Print immediately.
Print after 4 seconds.


In [17]:
# Example: this can be used to name files and have the time as a title
# https://docs.python.org/3/library/time.html --- here you can see the different letters to name the times

t = time.localtime()

print(time.strftime("%Y-%m-%d %H:%M:%S", t))        # get the current time in a readable format

2023-12-15 17:13:39


---
## 4. The pip package installer and external libraries

The Python Standard Library is a great resource for learning Python and getting things done quickly. However, it is not the only library available for Python. There are thousands of third-party libraries that you can install and use in your programs.

The most popular way to install third-party libraries in Python is with the pip package installer. pip is a command-line utility that you can use to install, uninstall, and manage Python packages. It is included with the Python installation, so you don’t need to install it separately.

To install a package with pip, you can use the following command in the terminal:

```python
pip install package_name
```

Let's check if 
In the terminal:

$ pip --version   

for mac (pip3 instead of pip, always):
$ pip3 --version

In [36]:
# You can also run terminal commands from a Jupyter Notebook using the ! symbol

!pip3 --version

770.45s - pydevd: Sending message related to process being replaced timed-out after 5 seconds


pip 21.2.4 from /Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/site-packages/pip (python 3.9)


In [54]:
!pip3 install numpy

3233.29s - pydevd: Sending message related to process being replaced timed-out after 5 seconds


Defaulting to user installation because normal site-packages is not writeable
You should consider upgrading via the '/Library/Developer/CommandLineTools/usr/bin/python3 -m pip install --upgrade pip' command.[0m


In [55]:
import numpy

Install numpy

$ pip install numpy

Uninstall numpy

$ pip uninstall numpy

Upgrade numpy 

$ pip install --upgrade numpy

Check all installed libraries

$ pip list

Also with freeze

$ pip freeze

 pip3 in mac 

Freeze is useful to export the list of packages installed in your environment to a file, usually called requirements.txt. This file can then be used by others to install the same packages in their own environment.

$ pip freeze > requirements.txt

$ pip3 freeze > requirements.txt in mac

In [42]:
!pip3 install --upgrade numpy

2367.44s - pydevd: Sending message related to process being replaced timed-out after 5 seconds


Defaulting to user installation because normal site-packages is not writeable
You should consider upgrading via the '/Library/Developer/CommandLineTools/usr/bin/python3 -m pip install --upgrade pip' command.[0m


In [44]:
ls

2496.94s - pydevd: Sending message related to process being replaced timed-out after 5 seconds


Excercise1day4.py      inverted_image.jpg     my_module.py
[34m__pycache__[m[m/           ipld_day01.ipynb       sales.csv
cat_img.jpg            ipld_day02.ipynb       spotify-2023.csv
expensive_houses.csv   ipld_day03.ipynb       stocks_prices.txt
expensive_houses.xlsx  ipld_day04.ipynb       [34mtest[m[m/
hello.py               ipld_day05.ipynb       test.ipynb
hello_module.py        ipld_day06.ipynb       test.txt
house_data.csv         ipld_day07.ipynb       weatherAUS.csv


In [52]:
! pip list


2880.20s - pydevd: Sending message related to process being replaced timed-out after 5 seconds


zsh:1: command not found: pip


In [58]:
cd path/to/your/project


[Errno 2] No such file or directory: 'path/to/your/project'
/Users/nicolasraimundez/Documents/Code - EAE Business School/eae_ipld


  bkms = self.shell.db.get('bookmarks', {})


In [53]:
pip install -r /path/to/your/project/requirements.txt

2890.45s - pydevd: Sending message related to process being replaced timed-out after 5 seconds


Defaulting to user installation because normal site-packages is not writeable
[31mERROR: Could not open requirements file: [Errno 2] No such file or directory: '/path/to/your/project/requirements.txt'[0m
You should consider upgrading via the '/Library/Developer/CommandLineTools/usr/bin/python3 -m pip install --upgrade pip' command.[0m
Note: you may need to restart the kernel to use updated packages.


In [56]:
!pip3 install -r requirements.txt # install all pip libraries needed for the project in a single command

3386.03s - pydevd: Sending message related to process being replaced timed-out after 5 seconds


Defaulting to user installation because normal site-packages is not writeable
You should consider upgrading via the '/Library/Developer/CommandLineTools/usr/bin/python3 -m pip install --upgrade pip' command.[0m


---
## 5. Exercises

#### Ex 1.

Create a new module that will have 2 functions:
- **separate_odd_even(nums)**: The first one will receive a list of integer numbers and return two separate list, one with the odd numbers and the other one with the even ones.

    for example: separate_odd_even([1,2,3,4,5,6,7,8,9]) -> [1,3,5,7,9], [2,4,6,8]

- **search_max_char(text)**: The second function will be receive a text and output the most common letter character in that text and how many times it appears.

    for example: most_common_letter("Hello World") -> "l", 3

Then import that module in the following cell and call them with two basic examples.

Make sure to submit the module python file to the assignment submission!!

In [None]:
# Your code here
# make new module like the class example and do the 2 functions

In [1]:
nums_list = [1, 2, 4, 6, 8]
start = 2
stop = 7

# result = 2*4*6 = 48

def multiply_in_range(nums_list, start, stop):
    final_prod = None

    for num in nums_list:
        if start <= num < stop:
            if not final_prod:
                final_prod = 1

            final_prod *= num
            print(final_prod)

    return final_prod


multiply_in_range([1, 2, 4, 6, 8], 2, 7)

2
8
48


48

#### Ex 2.

Install the pandas library using pip, import it in the following cell and then run the following line to create a dataframe and prove that it works.

```python   
# <do something to install and import pandas>

df = pd.DataFrame({"col1": [1, 2], "col2": [3, 4]})

In [59]:
!pip3 install pandas

155671.16s - pydevd: Sending message related to process being replaced timed-out after 5 seconds


Defaulting to user installation because normal site-packages is not writeable
You should consider upgrading via the '/Applications/Xcode.app/Contents/Developer/usr/bin/python3 -m pip install --upgrade pip' command.[0m


In [60]:
import pandas

In [61]:
import pandas as pd 

In [63]:
# Your code here

import pandas as pd                 # choosing the alias of pandas to pd, everyone uses pd for panda

df = pd.DataFrame({"col1": [1, 2], "col2": [3, 4]})

In [65]:
df

Unnamed: 0,col1,col2
0,1,3
1,2,4
