### Day 31 of programming

### Python Tutorial: Introduction to Pandas
Introduction to Pandas
Pandas is a library that provides two primary data structures:

Series: A one-dimensional labeled array capable of holding any data type.

DataFrame: A two-dimensional labeled data structure with columns of potentially different types.

To use Pandas, you first need to install it. If you haven't already, you can install it via pip:

In [None]:
pip install pandas


### Step 1: Importing Pandas
Start by importing the Pandas library in your script:

In [1]:
import pandas as pd


### Step 2: Creating Series
A Series is like a single column of data.

#### Creating a Series from a List

In [2]:
import pandas as pd

data = [10, 20, 30, 40, 50]
series = pd.Series(data)
print(series)


0    10
1    20
2    30
3    40
4    50
dtype: int64


#### Creating a Series with Custom Index

In [3]:
data = [10, 20, 30, 40, 50]
index = ['a', 'b', 'c', 'd', 'e']
series = pd.Series(data, index=index)
print(series)


a    10
b    20
c    30
d    40
e    50
dtype: int64


### Step 3: Creating DataFrames
A DataFrame is a two-dimensional table with labeled axes (rows and columns).

#### Creating a DataFrame from a Dictionary

In [4]:
data = {
    'Name': ['John', 'Anna', 'Peter'],
    'Age': [28, 24, 35],
    'City': ['New York', 'Paris', 'Berlin']
}

df = pd.DataFrame(data)
print(df)


    Name  Age      City
0   John   28  New York
1   Anna   24     Paris
2  Peter   35    Berlin


#### Creating a DataFrame from a List of Dictionaries

In [6]:
data = [
    {'Name': 'John', 'Age': 28, 'City': 'New York'},
    {'Name': 'Anna', 'Age': 24, 'City': 'Paris'},
    {'Name': 'Peter', 'Age': 35, 'City': 'Berlin'}
]

df = pd.DataFrame(data)
print(df)
df

    Name  Age      City
0   John   28  New York
1   Anna   24     Paris
2  Peter   35    Berlin


Unnamed: 0,Name,Age,City
0,John,28,New York
1,Anna,24,Paris
2,Peter,35,Berlin


### Step 4: Basic DataFrame Operations

In [9]:
print(df.tail())  # Displays the last 5 rows by default
print(df.shape)  # Prints (number of rows, number of columns)
print(df.columns)  # Prints the column names
print(df.index)  # Prints the index
print(df['Name'])  # Prints the 'Name' column
print(df[['Name', 'Age']])  # Prints 'Name' and 'Age' columns


    Name  Age      City
0   John   28  New York
1   Anna   24     Paris
2  Peter   35    Berlin
(3, 3)
Index(['Name', 'Age', 'City'], dtype='object')
RangeIndex(start=0, stop=3, step=1)
0     John
1     Anna
2    Peter
Name: Name, dtype: object
    Name  Age
0   John   28
1   Anna   24
2  Peter   35


### Step 5: Data Manipulation
Adding a New Column

In [10]:
df['Country'] = ['USA', 'France', 'Germany']
print(df)


    Name  Age      City  Country
0   John   28  New York      USA
1   Anna   24     Paris   France
2  Peter   35    Berlin  Germany


#### Removing a column

In [11]:
df = df.drop('Country', axis=1)  # Drops the 'Country' column
print(df)


    Name  Age      City
0   John   28  New York
1   Anna   24     Paris
2  Peter   35    Berlin


#### Renaming a columns

In [12]:
df = df.rename(columns={'Name': 'Full Name', 'Age': 'Age (years)'})
print(df)


  Full Name  Age (years)      City
0      John           28  New York
1      Anna           24     Paris
2     Peter           35    Berlin


### Step 6: Aggregation and Grouping
Basic Aggregations

Mean, Median, and Standard Deviation

In [13]:
print(df['Age (years)'].mean())  # Mean of 'Age (years)'
print(df['Age (years)'].median())  # Median of 'Age (years)'
print(df['Age (years)'].std())  # Standard deviation of 'Age (years)'


29.0
28.0
5.5677643628300215
