<h1>Part 1 : Data structures in pandas

In Pandas, there are two primary data structures:
- Series: A one-dimensional data structure that represents a column or a row in a table. It consists of two main components:
    - index: Labels for each data point, providing a way to access and manipulate the data.
    - values: The actual data points.
    - name (optional)
- DataFrames: A two-dimensional data structure that represents a table. It is essentially a collection of Series with a shared index. Key features include:
    - columns: Each column in a DataFrame is a Series, sharing the same index.
    - index: Labels for the rows, allowing for easy access and alignment of data.
    - values: The data stored in a tabular format.

A DataFrame can be thought of as a powerful and flexible container, akin to an Excel datasheet or a database table, where you can perform various operations such as filtering, grouping, merging, and analysis on structured data.

1. Importing the Pandas library

In [35]:
import pandas as pd

<h3> Series</h3>

1. Creating a Series from a homogeneous list

In [93]:
# Creating a list of house surface areas
# It represents the values of Series
surfaces = [120, 80, 150, 200, 90]

In [94]:
# Creating a Series from the list of values
# We can specify the name of Series 'Surface' (It is optional)
s_surfaces = pd.Series(surfaces, name='Surface')
s_surfaces

0    120
1     80
2    150
3    200
4     90
Name: Surface, dtype: int64

In [95]:
# We notice that the index of Series is not relevant (it is by default a sequence of integers)
s_surfaces.index

RangeIndex(start=0, stop=5, step=1)

In [96]:
# We can improve the series index
# We can spécify a relevant index , for example names of houses
# It is passed as a list
s_surfaces = pd.Series(surfaces, name='Surface', 
                                index=['Maison1', 'Maison2', 'Maison3', 'Maison4', 'Maison5'])
s_surfaces

Maison1    120
Maison2     80
Maison3    150
Maison4    200
Maison5     90
Name: Surface, dtype: int64

3. Creating a Series from a dictionary (heterogenous list)

In [98]:
# Creating a dictionary that contains the informations about a house
# For example, we specify the surface and the number of romms
house = {'Surface': 150, 'Rooms': 3}

In [99]:
# Creating a Series from the dictionary
s_house = pd.Series(house, name='House1')

In [100]:
s_house

Surface    150
Rooms        3
Name: House1, dtype: int64

In [101]:
# Get house index
s_house.index

Index(['Surface', 'Rooms'], dtype='object')

In [102]:
# Get house values
s_house.values

array([150,   3], dtype=int64)

In [103]:
# Get house name
s_house.name

'House1'

<h3>Dataframes

Dataframes can be created in 3 ways :
- First way : Create a dataframe based on columns
- Second way : Create a dataframe based on rows
- Third way : Create a dataframe based on CSV file

1. First way : Create dataframe based on columns

For example, we can follow the following steps :
- Create a list of house names 
- Create series as many informations of houses, in our example :
    - s_surfaces : a series for surfaces where index is the list names
    - s_rooms : a series for rooms where index is the list names

- Combine the 2 series s_surfaces and s_rooms togther as columns of one dataframe.

In [110]:
# Creating a list for the names of houses
names = ['House1','House2','House3','House4','House5']

# Creating series for surfaces of houses
s_surfaces = pd.Series([120, 80, 150, 200, 90], name='Surface', index=names)

# Creating series for rooms of houses
s_rooms = pd.Series([3, 2, 4, 5, 2], name='Rooms', index=names)

# Combining the 2 series s_surfaces and s_rooms togther as columns of one dataframe.
df_houses = pd.DataFrame({'Surface': s_surfaces, 'Rooms': s_rooms})

df_houses

Unnamed: 0,Surface,Rooms
House1,120,3
House2,80,2
House3,150,4
House4,200,5
House5,90,2


2. Second way : Create dataframe based on rows

For example, we can follow the following steps :
- Create series as many houses, in our example :
    - s_house1 : a series for 1st house
    - s_house2 : a series for 2nd house
    - ...

- Combine all the house series s_house1 , s_house2, s_house3, .. togther as rows of one dataframe 

In [111]:
# Creating series for each house
s_house1 = pd.Series({'Surface': 120, 'Rooms': 3}, name='House1')
s_house2 = pd.Series({'Surface': 80, 'Rooms': 2}, name='House2')
s_house3 = pd.Series({'Surface': 150, 'Rooms': 4}, name='House3')
s_house4 = pd.Series({'Surface': 200, 'Rooms': 5}, name='House4')
s_house5 = pd.Series({'Surface': 90, 'Rooms': 2}, name='House5')

# Combining the series (rows) into a DataFrame
df_houses = pd.DataFrame([s_house1, s_house2, s_house3, s_house4, s_house5])

# Displaying the DataFrame
df_houses

Unnamed: 0,Surface,Rooms
House1,120,3
House2,80,2
House3,150,4
House4,200,5
House5,90,2


3. Create a dataframe from CSV file

In [113]:
# Create a dataframe df_hauses from CSV file
df_houses = pd.read_csv('houses.csv' , 
                        index_col=0 , 
                        header=0)
df_houses

Unnamed: 0_level_0,Surface,Rooms
Name,Unnamed: 1_level_1,Unnamed: 2_level_1
House1,120,3
House2,80,2
House3,150,4
House4,200,5
House5,90,2


In [114]:
# get datafrmae index
df_houses.index

Index(['House1', 'House2', 'House3', 'House4', 'House5'], dtype='object', name='Name')

In [115]:
# get datafrmae columns
df_houses.columns

Index(['Surface', 'Rooms'], dtype='object')

In [116]:
# get datafrmae values
df_houses.values

array([[120,   3],
       [ 80,   2],
       [150,   4],
       [200,   5],
       [ 90,   2]], dtype=int64)

<h1>Exercice</h1>

Create a dataframe composed of the following clients using the 3 ways seen above :
- Client 1 : name = 'Ahmed' , salary = 1200.65 , age = 28
- Client 2 : name = 'Sarra' , salary = 1800.87 , age = 35
- Client 3 : name = 'Rahma' , salary = 1500.75 , age = 30

Note : 
- The dataframe index is the client names 

In [None]:
# Based on first way (columns -> dataframe)


In [None]:
# Based on second way (rows -> dataframe)


In [None]:
# Based on third way (CSV -> dataframe)
