<h1 align='center'> Basic libraries </h1

So far we have access some built-in functions in Python and access some classess that are already defined by default. However, there are some external modules one can import to have access some additional functions, methods and classes. This modules are commonly called as libraries and are imported into an instance of Python by using the `import` statement. We will ses some basic examples.

<h2> Maths </h2>

In [1]:
import math

The `math` module includes a set of mathematical tools like using the squared root, the factorial, finding a ceil of a decimal, and so on. Here are some examples of what if library provides:

In [14]:
print(math.sqrt(16)) # square root
print(math.ceil(3.1416)) # the ceil integer
print(math.log(10)) # natural logarithm
print(math.sin(0)) # sine function
print(math.cos(0)) # cosine function

4.0
4
2.302585092994046
0.0
1.0


<h2> Random </h2>

In [15]:
import random

The `random` module gives functions to generate random numbers or to retrieve a random element of a given collection:

In [18]:
print(random.choice([4,6,10,2,5,6])) # choice a random element of a list-like object
print(random.choice([4,6,10,2,5,6])) # choice a random element of a list-like object
print(random.choice([4,6,10,2,5,6])) # choice a random element of a list-like object

2
6
4


Since we are choosing a random element, everytime we run the above code wi will retrieve a different output. However, if we want to set the same response everytime one runs a program that is based on probability, we can use the `seed()` function that help us to always see the same output. Try to run the following code several times:

In [21]:
random.seed(57428)
print(random.choice([4,6,10,2,5,6]))

5


We can also generate a random integer in a given interval (say 0 and 100) with the `randint()` function:

In [24]:
random.randint(0, 100)

31

<h2> Numpy </h2>

Numpy stands for "Numerical Python", and it is a library that give more sofisticated elements than `math` to work with mathematical objects, some of which we will study later on and are often used in data science.

Contrary to what we have seen so far, `numpy` cannot be access without previous installation, and if we try to do it we will get the following error:

<img src='img/np_error.png'> 

So to properly install a library we have to run `pip install numpy` or `conda install numpy` from your terminal depending on your prefered installation manager. To understand some basic differences between the two, one can see the following information: https://www.anaconda.com/blog/understanding-conda-and-pip

Once we have already installed the package, we can import it:

In [28]:
import numpy as np

In some cases it is recommendable to find an alias of a given library when importing, this is the `as` statement does when importing a library. In addition, there are some common aliases that are used for certain libraries and the good practice is to keep using those aliases, although we could technically use whatever we want as an alias. Whenever we introduce a new library during the course we will specify which is the recommended alias for it.

Now let's see some of the functions `numpy` gives access to (some of which are also included in `math`):

In [40]:
print(np.sin(0)) # sine function
print(np.cos(0)) # cosine function
print(np.sqrt(16)) # squared root
print(np.abs(-6)) # absolute value
print(np.sum([7,8,4,2,5])) # sum all the elements of a list-like object
print(np.prod([1,2,3,4,5,6,7,8,9,10])) # multiply all the elements in a list-like object

0.0
1.0
4.0
6
26
3628800


However, `numpy` is more powerful than this, and gives a set of tools for data storing (like what we studied in the Data Structure notebook) by using array objects:

In [42]:
an_array = np.array([4,5,6,7,8,9,0,10,14, 20])
print(an_array)

[ 4  5  6  7  8  9  0 10 14 20]


Although this looks like a list, it is not, and arrays are powerful objects and most of what we will be doing in data science are based on them.

**N.B:** Did you now that the product of an empty set is 1? Let's check this out by using `numpy`:

In [44]:
np.prod([])

1.0

¡Ta-Da!

<h2> Pandas </h2>

`pandas` is by far the more useful library we will be using as a data scientists since it will gives us a more comfortable way to see data storage. However, lucky for us, `Pandas` is also very compatible to `numpy` and dictionaries.

To access `pandas` we will also to install it the same way we did with `numpy`, but once we do it we can import it like:

In [45]:
import pandas as pd

First we might inspect a little bit we is `pandas` so commonly use, and what makes it so powerful in data science. We will store some data into a dictionary and then import it into a pandas DataFrame, which is also an object for data storage:

In [46]:
people_dict = {
    'names': ['John', 'Alfred', 'Alex'],
    'surnames': ['Smith', 'Wick', 'Chapman']
}

pd.DataFrame.from_dict(people_dict)

Unnamed: 0,names,surnames
0,John,Smith
1,Alfred,Wick
2,Alex,Chapman


Table-like data storaging is very popular because it resembles Microsoft Excel and this is what most of us first used to manipulate data. Also it is very easy to see the information when having several rows and columns. Although the previous example is a very small dataset, in real life we might encounter with thousands of rows and too many columns. This scenario would be very difficult to see on the screen in a dictionary or in several lists or tuples. However, it is important to mention that all this data storage classes are not perfectly substitutes, but rather all of them have their time and place and we will be using them when needed.

A more common function in `pandas` is the `read_csv()` that allows to import .csv files from our local computer, and also we can use some other like `read_excel()` to import Excel tables. To illustrate this, I will import the gdp_per_capita.csv file that contains information of several countries, and I have downloaded from [Kaggle](https://www.kaggle.com/datasets/abhilashanil/better-life-index-and-gross-domestic-product?resource=download) into my computer:

In [49]:
df = pd.read_csv('Data/gdp_per_capita.csv', encoding='latin1')
df

Unnamed: 0,Country,Subject Descriptor,Units,Scale,Country/Series-specific Notes,2015,Estimates Start After
0,Afghanistan,Gross domestic product per capita current prices,U.S. dollars,Units,See notes for: Gross domestic product current...,599.994,2013.0
1,Albania,Gross domestic product per capita current prices,U.S. dollars,Units,See notes for: Gross domestic product current...,3995.383,2010.0
2,Algeria,Gross domestic product per capita current prices,U.S. dollars,Units,See notes for: Gross domestic product current...,4318.135,2014.0
3,Angola,Gross domestic product per capita current prices,U.S. dollars,Units,See notes for: Gross domestic product current...,4100.315,2014.0
4,Antigua and Barbuda,Gross domestic product per capita current prices,U.S. dollars,Units,See notes for: Gross domestic product current...,14414.302,2011.0
...,...,...,...,...,...,...,...
185,Vietnam,Gross domestic product per capita current prices,U.S. dollars,Units,See notes for: Gross domestic product current...,2088.344,2012.0
186,Yemen,Gross domestic product per capita current prices,U.S. dollars,Units,See notes for: Gross domestic product current...,1302.940,2008.0
187,Zambia,Gross domestic product per capita current prices,U.S. dollars,Units,See notes for: Gross domestic product current...,1350.151,2010.0
188,Zimbabwe,Gross domestic product per capita current prices,U.S. dollars,Units,See notes for: Gross domestic product current...,1064.350,2012.0


By running the `df` variable we print the first and last five rows of the dataset, but sometime to see the first rows are enough, so we can implemente the `head()` method:

In [51]:
df.head(10) 

Unnamed: 0,Country,Subject Descriptor,Units,Scale,Country/Series-specific Notes,2015,Estimates Start After
0,Afghanistan,Gross domestic product per capita current prices,U.S. dollars,Units,See notes for: Gross domestic product current...,599.994,2013.0
1,Albania,Gross domestic product per capita current prices,U.S. dollars,Units,See notes for: Gross domestic product current...,3995.383,2010.0
2,Algeria,Gross domestic product per capita current prices,U.S. dollars,Units,See notes for: Gross domestic product current...,4318.135,2014.0
3,Angola,Gross domestic product per capita current prices,U.S. dollars,Units,See notes for: Gross domestic product current...,4100.315,2014.0
4,Antigua and Barbuda,Gross domestic product per capita current prices,U.S. dollars,Units,See notes for: Gross domestic product current...,14414.302,2011.0
5,Argentina,Gross domestic product per capita current prices,U.S. dollars,Units,See notes for: Gross domestic product current...,13588.846,2013.0
6,Armenia,Gross domestic product per capita current prices,U.S. dollars,Units,See notes for: Gross domestic product current...,3534.86,2014.0
7,Australia,Gross domestic product per capita current prices,U.S. dollars,Units,See notes for: Gross domestic product current...,50961.865,2014.0
8,Austria,Gross domestic product per capita current prices,U.S. dollars,Units,See notes for: Gross domestic product current...,43724.031,2015.0
9,Azerbaijan,Gross domestic product per capita current prices,U.S. dollars,Units,See notes for: Gross domestic product current...,5739.433,2014.0


Here I have told `pandas` to show the first 10 rows of the dataset, but the default is five:

In [52]:
df.head()

Unnamed: 0,Country,Subject Descriptor,Units,Scale,Country/Series-specific Notes,2015,Estimates Start After
0,Afghanistan,Gross domestic product per capita current prices,U.S. dollars,Units,See notes for: Gross domestic product current...,599.994,2013.0
1,Albania,Gross domestic product per capita current prices,U.S. dollars,Units,See notes for: Gross domestic product current...,3995.383,2010.0
2,Algeria,Gross domestic product per capita current prices,U.S. dollars,Units,See notes for: Gross domestic product current...,4318.135,2014.0
3,Angola,Gross domestic product per capita current prices,U.S. dollars,Units,See notes for: Gross domestic product current...,4100.315,2014.0
4,Antigua and Barbuda,Gross domestic product per capita current prices,U.S. dollars,Units,See notes for: Gross domestic product current...,14414.302,2011.0


Similarly we can retrieve the last rows by implementing the `tail()` method:

In [53]:
df.tail()

Unnamed: 0,Country,Subject Descriptor,Units,Scale,Country/Series-specific Notes,2015,Estimates Start After
185,Vietnam,Gross domestic product per capita current prices,U.S. dollars,Units,See notes for: Gross domestic product current...,2088.344,2012.0
186,Yemen,Gross domestic product per capita current prices,U.S. dollars,Units,See notes for: Gross domestic product current...,1302.94,2008.0
187,Zambia,Gross domestic product per capita current prices,U.S. dollars,Units,See notes for: Gross domestic product current...,1350.151,2010.0
188,Zimbabwe,Gross domestic product per capita current prices,U.S. dollars,Units,See notes for: Gross domestic product current...,1064.35,2012.0
189,International Monetary Fund World Economic Out...,,,,,,


<h2> References </h2>

- https://docs.python.org/3/library/math.html
- https://docs.python.org/3/library/random.html
- https://numpy.org/
- https://pandas.pydata.org/