### Pandas

In this notebook we are going to learn some `pandas` stuff when working with data. In this notebook we are going to look at the following in pandas.

1. Input and Output ✔
2. Pandas series
3. DataFrames
4. Pandas Arrays
5. Index Objects


If you don't have pandas installed you can install it by running the following command:

```shell
!pip install pandas
```

> Note that when you are using `anacoda` you don't need to install pandas as it comes installed.

In the following code cell we are going to import `pandas` with an alias `pd` and check the version of pandas.

In [2]:
import pandas as pd
pd.__version__

'1.5.3'

### Input and Output 

In this section we are going to look at `io` in pandas. Which is basically reading and writing in pandas.

1. `CSV` files - these are files columns of data are serperated using a comma, that why it's called `Comma Serperated Values`. We are going to read and write to csv files.

> To read a `csv` file in pandas we use the `.read_csv` function. This function returns us a pandas dataframe.

In [3]:
dataframe = pd.read_csv('files/airquality.csv')

In the `dataframe` object we can basically check the first `5` rows of data using the `head()` function.

In [4]:
dataframe.head()

Unnamed: 0.1,Unnamed: 0,Ozone,Solar.R,Wind,Temp,Month,Day
0,1,41.0,190.0,7.4,67,5,1
1,2,36.0,118.0,8.0,72,5,2
2,3,12.0,149.0,12.6,74,5,3
3,4,18.0,313.0,11.5,62,5,4
4,5,,,14.3,56,5,5


In the `dataframe` object we can basically check the last `5` rows of data using the `tail()` function.

In [6]:
dataframe.tail()

Unnamed: 0.1,Unnamed: 0,Ozone,Solar.R,Wind,Temp,Month,Day
148,149,30.0,193.0,6.9,70,9,26
149,150,,145.0,13.2,77,9,27
150,151,14.0,191.0,14.3,75,9,28
151,152,18.0,131.0,8.0,76,9,29
152,153,20.0,223.0,11.5,68,9,30


> In the `head()` or `tail()` function you can specify the number of rows you want.

In [7]:
dataframe.head(10)

Unnamed: 0.1,Unnamed: 0,Ozone,Solar.R,Wind,Temp,Month,Day
0,1,41.0,190.0,7.4,67,5,1
1,2,36.0,118.0,8.0,72,5,2
2,3,12.0,149.0,12.6,74,5,3
3,4,18.0,313.0,11.5,62,5,4
4,5,,,14.3,56,5,5
5,6,28.0,,14.9,66,5,6
6,7,23.0,299.0,8.6,65,5,7
7,8,19.0,99.0,13.8,59,5,8
8,9,8.0,19.0,20.1,61,5,9
9,10,,194.0,8.6,69,5,10


In [8]:
dataframe.tail(7)

Unnamed: 0.1,Unnamed: 0,Ozone,Solar.R,Wind,Temp,Month,Day
146,147,7.0,49.0,10.3,69,9,24
147,148,14.0,20.0,16.6,63,9,25
148,149,30.0,193.0,6.9,70,9,26
149,150,,145.0,13.2,77,9,27
150,151,14.0,191.0,14.3,75,9,28
151,152,18.0,131.0,8.0,76,9,29
152,153,20.0,223.0,11.5,68,9,30


### Dataframe

In this section we are going to have a look at how we can create our own dataframe from iteratable using the `DataFrame` class. Let's start by creating a dataframe from `list` of `dictionaries` as follows:

In [10]:
users = [
    {'gender': 'male', 'name': 'Jonh', 'age': 56, 'surname': 'Doe'},
    {'gender': 'female', 'name': 'Mary', 'age': 26, 'surname': 'Jack'},
    {'gender': 'male', 'name': 'Peter', 'age': 34, 'surname': 'Gross'}
]

df = pd.DataFrame(users)
df

Unnamed: 0,gender,name,age,surname
0,male,Jonh,56,Doe
1,female,Mary,26,Jack
2,male,Peter,34,Gross


We can also use `list` of `lists` to create a pandas dataframe. Let's have a look at the following example. 

> When using `list` of `lists`  we need to specify the `columns` which are the column names.

In [19]:
rows = [user.values() for user in users]
headers = users[0].keys()
headers, rows

(dict_keys(['gender', 'name', 'age', 'surname']),
 [dict_values(['male', 'Jonh', 56, 'Doe']),
  dict_values(['female', 'Mary', 26, 'Jack']),
  dict_values(['male', 'Peter', 34, 'Gross'])])

In [20]:
df = pd.DataFrame(rows, columns=headers)
df

Unnamed: 0,gender,name,age,surname
0,male,Jonh,56,Doe
1,female,Mary,26,Jack
2,male,Peter,34,Gross


The dataframe object has various a methods of saving data to files. Let's save our dataframes to files in a folder called `saved`.

In [22]:
import os

if not os.path.exists('saved'):
    os.mkdir('saved')

Let's save our dataframe to a `.csv` file in teh folder called `saved`.

> Specifying that `index=False` we are telling pandas that we don't want a column of index in our `csv` file.

In [25]:
df.to_csv('saved/people.csv', index=False)
print('Saved')

Saved


### Refs

1. [pandas.pydata.org](https://pandas.pydata.org/docs/user_guide/index.html)
2. []()