# Serie 1 : Basic you should know with pandas
##### Ing. Jean Didier KOUAKOU /jeandidikouakou@gmail.com/+225 0555420217

### What is Pandas?

- A powerful data manipulation and analysis library for Python.
- Ideal for handling structured data (like spreadsheets or SQL tables).

### Why Use Pandas ?

- Simplifies data analysis tasks.
- Provides intuitive data structures (Series and DataFrame).
- Integration with other data science libraries (NumPy, Matplotlib).

### Key Data Structures

#### Series :
One-dimensional labeled array capable of holding any data type

In [33]:
import pandas as pd
s = pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])
print(s)

a    1
b    2
c    3
d    4
dtype: int64


#### DataFrame :
Two-dimensional labeled data structure with columns of potentially different types.

In [34]:
data = {
    "Name" : ['John','Joseph','Grace','Eddy'],
    "Age" : [23,22,25,30],
    "Sex" : ['M','M','F','M'],
    "Grade" : [12,14,17,15]
}
df=pd.DataFrame(data)
print(df)

     Name  Age Sex  Grade
0    John   23   M     12
1  Joseph   22   M     14
2   Grace   25   F     17
3    Eddy   30   M     15


### Basic Operations with DataFrames

#### Loading Data

In [None]:
#Reading data from CSV files:
df=pd.read_csv('file.csv')

#Reading data from Excel files:
df=pd.read_excel('file.xlsx')

#### Viewing Data

In [37]:
#Displaying the first few rows (the first 5):
print(df.head())

#Displaying the first two rows: row 0 and row 1 (2 rows)
print(df.head(2))

#Checking the shape (rows, columns):
print(df.shape)


     Name  Age Sex  Grade
0    John   23   M     12
1  Joseph   22   M     14
2   Grace   25   F     17
3    Eddy   30   M     15
     Name  Age Sex  Grade
0    John   23   M     12
1  Joseph   22   M     14
(4, 4)


### Selecting Data

#### Selecting columns

In [38]:
print(data['Name'])

['John', 'Joseph', 'Grace', 'Eddy']


#### Selecting rows by index

In [39]:
#Printing the first row
print(df.iloc[0]) 

Name     John
Age        23
Sex         M
Grade      12
Name: 0, dtype: object


### Filtering Data

#### Selecting data in a column

In [40]:
print(df[df['Name']=='John'])

   Name  Age Sex  Grade
0  John   23   M     12


#### Conditional filtering

In [41]:
print(df[df['Grade']>14])

    Name  Age Sex  Grade
2  Grace   25   F     17
3   Eddy   30   M     15


### Data Manipulation

#### Adding Columns

In [42]:
#Using external data
#df['New_column']=['...','...','...','...']
df['Result']=['Fail','Fail','Pass','Pass']
print(df)

     Name  Age Sex  Grade Result
0    John   23   M     12   Fail
1  Joseph   22   M     14   Fail
2   Grace   25   F     17   Pass
3    Eddy   30   M     15   Pass


In [43]:
#Using internal data
#df['New_column']=df['...']+ your modification
df['Mark']= df['Grade']*5
print(df)

     Name  Age Sex  Grade Result  Mark
0    John   23   M     12   Fail    60
1  Joseph   22   M     14   Fail    70
2   Grace   25   F     17   Pass    85
3    Eddy   30   M     15   Pass    75


#### Removing Columns

In [44]:
# Dropping a column
df.drop('Grade', axis=1, inplace=True)
print(df)

     Name  Age Sex Result  Mark
0    John   23   M   Fail    60
1  Joseph   22   M   Fail    70
2   Grace   25   F   Pass    85
3    Eddy   30   M   Pass    75


In [45]:
df

Unnamed: 0,Name,Age,Sex,Result,Mark
0,John,23,M,Fail,60
1,Joseph,22,M,Fail,70
2,Grace,25,F,Pass,85
3,Eddy,30,M,Pass,75


#### Data Aggregation

In [52]:
# Grouping data
group = df.groupby('Age').count()
print(group)

     Name  Sex  Result  Mark
Age                         
22      1    1       1     1
23      1    1       1     1
25      1    1       1     1
30      1    1       1     1


### Handling Missing Data

#### Identifying Missing Values

In [54]:
#Checking for NaNs:
print(df.isnull().sum())

Name      0
Age       0
Sex       0
Result    0
Mark      0
dtype: int64


#### Filling Missing Values

In [55]:
#Filling with a constant value:

df.fillna(0, inplace=True)

In [56]:
#### Dropping missing values:

In [57]:
df.dropna(inplace=True)

### Exporting Data

In [60]:
#Writing to a CSV file:
df.to_csv('file.csv', index=False)

#Writing to a excel file:
df.to_excel('file.xlsx', index=False)

#Writing to a json file:
df.to_json('file.txt', index=False)

### End