## 1. Introduction to Pandas

To use Pandas, you first need to import it. By convention, Pandas is imported with the alias pd.

In [1]:
import pandas as pd

## 2. Series

A Series is a one-dimensional array-like object that can hold any data type.

### Creating a Series

In [2]:
# Creating a Series from a list
data = [1, 2, 3, 4, 5]
s = pd.Series(data)
print(s)

# Creating a Series with custom indices
s = pd.Series(data, index=['a', 'b', 'c', 'd', 'e'])
print(s)

0    1
1    2
2    3
3    4
4    5
dtype: int64
a    1
b    2
c    3
d    4
e    5
dtype: int64


## 3. DataFrame

A DataFrame is a two-dimensional, size-mutable, and potentially heterogeneous tabular data structure with labeled axes (rows and columns).

### Creating a DataFrame

#### From a dictionary

In [3]:
# Creating a DataFrame from a dictionary
data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'City': ['New York', 'Los Angeles', 'Chicago']
}
df = pd.DataFrame(data)
print(df)

      Name  Age         City
0    Alice   25     New York
1      Bob   30  Los Angeles
2  Charlie   35      Chicago


#### From a list of dictionaries

In [4]:
# Creating a DataFrame from a list of dictionaries
data = [
    {'Name': 'Alice', 'Age': 25, 'City': 'New York'},
    {'Name': 'Bob', 'Age': 30, 'City': 'Los Angeles'},
    {'Name': 'Charlie', 'Age': 35, 'City': 'Chicago'}
]
df = pd.DataFrame(data)
print(df)

      Name  Age         City
0    Alice   25     New York
1      Bob   30  Los Angeles
2  Charlie   35      Chicago


#### From a CSV file

In [None]:
# Creating a DataFrame by reading a CSV file
df = pd.read_csv('data.csv')
print(df)

## 4. Basic DataFrame Operations

### 1. Viewing Data

In [6]:
# Viewing the first few rows
print(df.head())
print('\n')
# Viewing the last few rows
print(df.tail())
print('\n')
# Viewing a summary of the DataFrame
print(df.info())
print('\n')
# Viewing basic statistics
print(df.describe())

      Name  Age         City
0    Alice   25     New York
1      Bob   30  Los Angeles
2  Charlie   35      Chicago


      Name  Age         City
0    Alice   25     New York
1      Bob   30  Los Angeles
2  Charlie   35      Chicago


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   Name    3 non-null      object
 1   Age     3 non-null      int64 
 2   City    3 non-null      object
dtypes: int64(1), object(2)
memory usage: 200.0+ bytes
None


        Age
count   3.0
mean   30.0
std     5.0
min    25.0
25%    27.5
50%    30.0
75%    32.5
max    35.0


### 2. Selecting Columns

In [7]:
# Selecting a single column
print(df['Name'])
print('\n')
# Selecting multiple columns
print(df[['Name', 'Age']])

0      Alice
1        Bob
2    Charlie
Name: Name, dtype: object


      Name  Age
0    Alice   25
1      Bob   30
2  Charlie   35


### 3. Selecting Rows

In [8]:
# Selecting rows by index
print(df.iloc[0])    # First row
print(df.iloc[0:2])  # First two rows
print('\n')

# Selecting rows by label
print(df.loc[0])     # First row
print(df.loc[0:1])   # First two rows
print('\n')

# Boolean indexing
print(df[df['Age'] > 30])

Name       Alice
Age           25
City    New York
Name: 0, dtype: object
    Name  Age         City
0  Alice   25     New York
1    Bob   30  Los Angeles


Name       Alice
Age           25
City    New York
Name: 0, dtype: object
    Name  Age         City
0  Alice   25     New York
1    Bob   30  Los Angeles


      Name  Age     City
2  Charlie   35  Chicago


### 4. Adding and Removing Columns

In [9]:
# Adding a new column
df['Country'] = 'USA'
print(df)
print('\n')

# Removing a column
df = df.drop(columns=['Country'])
print(df)

      Name  Age         City Country
0    Alice   25     New York     USA
1      Bob   30  Los Angeles     USA
2  Charlie   35      Chicago     USA


      Name  Age         City
0    Alice   25     New York
1      Bob   30  Los Angeles
2  Charlie   35      Chicago


### 5. Modifying Data

In [10]:
# Modifying a single value
df.at[0, 'Age'] = 26
print(df)
print('\n')

# Modifying an entire column
df['Age'] = df['Age'] + 1
print(df)


      Name  Age         City
0    Alice   26     New York
1      Bob   30  Los Angeles
2  Charlie   35      Chicago


      Name  Age         City
0    Alice   27     New York
1      Bob   31  Los Angeles
2  Charlie   36      Chicago


## 5. Handling Missing Data

### 1. Identifying Missing Data

In [11]:
# Detecting missing values
print(df.isnull())
print('\n')

# Summarizing missing values
print(df.isnull().sum())


    Name    Age   City
0  False  False  False
1  False  False  False
2  False  False  False


Name    0
Age     0
City    0
dtype: int64


### 2. Handling Missing Data

In [12]:
# Dropping rows with missing values
df_cleaned = df.dropna()
print(df_cleaned)
print('\n')

# Filling missing values
df_filled = df.fillna(0)
print(df_filled)


      Name  Age         City
0    Alice   27     New York
1      Bob   31  Los Angeles
2  Charlie   36      Chicago


      Name  Age         City
0    Alice   27     New York
1      Bob   31  Los Angeles
2  Charlie   36      Chicago


## 6. DataFrame Operations

### 1. Sorting

In [13]:
# Sorting by column
df_sorted = df.sort_values(by='Age')
print(df_sorted)


      Name  Age         City
0    Alice   27     New York
1      Bob   31  Los Angeles
2  Charlie   36      Chicago


### 2. Grouping

In [14]:
# Grouping and aggregating data
grouped = df.groupby('City').mean()
print(grouped)


              Age
City             
Chicago      36.0
Los Angeles  31.0
New York     27.0


### 3. Merging and Joining

In [15]:
# Creating another DataFrame
data2 = {
    'Name': ['Alice', 'Bob', 'David'],
    'Salary': [50000, 60000, 70000]
}
df2 = pd.DataFrame(data2)

# Merging DataFrames
merged_df = pd.merge(df, df2, on='Name', how='inner')
print(merged_df)
print('\n')

# Joining DataFrames
df1 = df.set_index('Name')
df2 = df2.set_index('Name')
joined_df = df1.join(df2, how='inner')
print(joined_df)

    Name  Age         City  Salary
0  Alice   27     New York   50000
1    Bob   31  Los Angeles   60000


       Age         City  Salary
Name                           
Alice   27     New York   50000
Bob     31  Los Angeles   60000


## 7. Advanced DataFrame Operations

### 1. Applying Functions

In [16]:
# Applying a function to a column
df['Age'] = df['Age'].apply(lambda x: x + 1)
print(df)

# Applying a function to the entire DataFrame
df = df.applymap(str)
print(df)

      Name  Age         City
0    Alice   28     New York
1      Bob   32  Los Angeles
2  Charlie   37      Chicago
      Name Age         City
0    Alice  28     New York
1      Bob  32  Los Angeles
2  Charlie  37      Chicago


## 8. Working with Time Series Data 

### 1. Creating a Date Range

In [17]:
# Creating a date range
date_rng = pd.date_range(start='2021-01-01', end='2021-01-10', freq='D')
print(date_rng)

DatetimeIndex(['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04',
               '2021-01-05', '2021-01-06', '2021-01-07', '2021-01-08',
               '2021-01-09', '2021-01-10'],
              dtype='datetime64[ns]', freq='D')


### 2. Creating a DataFrame with DateTime Index

In [18]:
# Creating a DataFrame with a DateTime index
df_dates = pd.DataFrame(date_rng, columns=['date'])
df_dates['data'] = pd.Series(range(1, len(df_dates)+1))
df_dates = df_dates.set_index('date')
print(df_dates)

            data
date            
2021-01-01     1
2021-01-02     2
2021-01-03     3
2021-01-04     4
2021-01-05     5
2021-01-06     6
2021-01-07     7
2021-01-08     8
2021-01-09     9
2021-01-10    10
