### Assignment 1: DataFrame Creation and Indexing

1. Create a Pandas DataFrame with 4 columns and 6 rows filled with random integers. Set the index to be the first column.
2. Create a Pandas DataFrame with columns 'A', 'B', 'C' and index 'X', 'Y', 'Z'. Fill the DataFrame with random integers and access the element at row 'Y' and column 'B'.

In [1]:
import pandas as pd
import numpy as np

In [2]:
# Question 1.
# Create DataFrame with 4 columns and 6 rows (random integers)
df = pd.DataFrame(np.random.randint(1, 100, size=(6, 4)),
                  columns=['Col1', 'Col2', 'Col3', 'Col4'])

# Set the index as the first column
df = df.set_index('Col1')

print(df)

      Col2  Col3  Col4
Col1                  
65      81    67    89
75      15    26    36
1       75    96    88
66       9    47    16
87       9     4    74
5       69    16    61


In [3]:
# Question 2.
# DataFrame with given rows and columns
df2 = pd.DataFrame(np.random.randint(1, 100, size=(3, 3)),
                   columns=['A', 'B', 'C'],
                   index=['X', 'Y', 'Z'])

print(df2)

# Accessing element at row 'Y' and column 'B'
value = df2.loc['Y', 'B']
print("\nElement at row 'Y' and column 'B':", value)

    A   B   C
X  94  82  91
Y  66  77  16
Z  35  80  53

Element at row 'Y' and column 'B': 77


### Assignment 2: DataFrame Operations

1. Create a Pandas DataFrame with 3 columns and 5 rows filled with random integers. Add a new column that is the product of the first two columns.
2. Create a Pandas DataFrame with 3 columns and 4 rows filled with random integers. Compute the row-wise and column-wise sum.


In [4]:
# Question 1.

# Create DataFrame
df = pd.DataFrame(np.random.randint(1, 10, size=(5, 3)),
                  columns=['A', 'B', 'C'])

# Add a new column which is the product of A and B
df['Product'] = df['A'] * df['B']

print(df)

   A  B  C  Product
0  9  8  2       72
1  7  5  6       35
2  8  8  6       64
3  3  8  3       24
4  1  9  6        9


In [5]:
# Question 2.
# Create DataFrame
df2 = pd.DataFrame(np.random.randint(1, 10, size=(4, 3)),
                   columns=['X', 'Y', 'Z'])

print("Original DataFrame:")
print(df2)

# Row-wise sum → sum across columns
row_sum = df2.sum(axis=1)
print("\nRow-wise Sum:")
print(row_sum)

# Column-wise sum → sum down rows
col_sum = df2.sum(axis=0)
print("\nColumn-wise Sum:")
print(col_sum)

Original DataFrame:
   X  Y  Z
0  5  3  4
1  3  6  8
2  4  5  4
3  7  1  7

Row-wise Sum:
0    12
1    17
2    13
3    15
dtype: int64

Column-wise Sum:
X    19
Y    15
Z    23
dtype: int64


### Assignment 3: Data Cleaning

1. Create a Pandas DataFrame with 3 columns and 5 rows filled with random integers. Introduce some NaN values. Fill the NaN values with the mean of the respective columns.
2. Create a Pandas DataFrame with 4 columns and 6 rows filled with random integers. Introduce some NaN values. Drop the rows with any NaN values.

In [8]:
# Question 1.
df = pd.DataFrame(np.random.randint(1, 20, size=(5, 3)),
                  columns=['A', 'B', 'C'])

# Introduce some NaN values
df.iloc[1, 0] = np.nan
df.iloc[3, 2] = np.nan

print('Dataframe with NaN values ')
print(df)
# Fill NaN with column-wise mean
df_filled = df.fillna(df.mean())

print('\nDataframe with filled NaN value')
print(df_filled)

Dataframe with NaN values 
      A   B     C
0  11.0  18  14.0
1   NaN   2  19.0
2  15.0   6   5.0
3  19.0   7   NaN
4  19.0   7  19.0

Dataframe with filled NaN value
      A   B      C
0  11.0  18  14.00
1  16.0   2  19.00
2  15.0   6   5.00
3  19.0   7  14.25
4  19.0   7  19.00


In [10]:
# Question 2.
df2 = pd.DataFrame(np.random.randint(1, 50, size=(6, 4)),
                   columns=['W', 'X', 'Y', 'Z'])

# Introduce NaN values
df2.iloc[0, 2] = np.nan
df2.iloc[4, 1] = np.nan
print('Dataframe with NaN values ')
print(df)
# Drop rows containing any NaN value
df2_cleaned = df2.dropna()
print('\nAfter droping NaN Value ')
print(df2_cleaned)

Dataframe with NaN values 
      A   B     C
0  11.0  18  14.0
1   NaN   2  19.0
2  15.0   6   5.0
3  19.0   7   NaN
4  19.0   7  19.0

After droping NaN Value 
    W     X     Y   Z
1  27  24.0  40.0  12
2   1  26.0  20.0   9
3   7   7.0  38.0   2
5  43  43.0  17.0  14


### Assignment 4: Data Aggregation

1. Create a Pandas DataFrame with 2 columns: 'Category' and 'Value'. Fill the 'Category' column with random categories ('A', 'B', 'C') and the 'Value' column with random integers. Group the DataFrame by 'Category' and compute the sum and mean of 'Value' for each category.
2. Create a Pandas DataFrame with 3 columns: 'Product', 'Category', and 'Sales'. Fill the DataFrame with random data. Group the DataFrame by 'Category' and compute the total sales for each category.

In [11]:
# Question 1.
df = pd.DataFrame({
    'Category': np.random.choice(['A', 'B', 'C'], size=10),
    'Value': np.random.randint(1, 100, size=10)
})

# Group by Category and compute sum & mean
result = df.groupby('Category')['Value'].agg(['sum', 'mean'])

print(result)

          sum       mean
Category                
A         384  64.000000
B          41  41.000000
C         170  56.666667


In [12]:
# Question 2.
df2 = pd.DataFrame({
    'Product': ['P1', 'P2', 'P3', 'P4', 'P5', 'P6'],
    'Category': np.random.choice(['Electronics', 'Clothing', 'Grocery'], size=6),
    'Sales': np.random.randint(100, 1000, size=6)
})

# Group by Category and compute total sales
category_sales = df2.groupby('Category')['Sales'].sum()

print(category_sales)

Category
Electronics    2080
Grocery        1451
Name: Sales, dtype: int32


### Assignment 5: Merging DataFrames

1. Create two Pandas DataFrames with a common column. Merge the DataFrames using the common column.
2. Create two Pandas DataFrames with different columns. Concatenate the DataFrames along the rows and along the columns.

In [13]:
# Question 1.
# First DataFrame
df1 = pd.DataFrame({
    'ID': [1, 2, 3, 4],
    'Name': ['A', 'B', 'C', 'D']
})

# Second DataFrame
df2 = pd.DataFrame({
    'ID': [1, 2, 3, 4],
    'Score': [90, 85, 88, 92]
})

# Merge on the common column 'ID'
merged_df = df1.merge(df2, on='ID')

print(merged_df)

   ID Name  Score
0   1    A     90
1   2    B     85
2   3    C     88
3   4    D     92


In [14]:
# Question 2.
df3 = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6]
})

# Second DataFrame
df4 = pd.DataFrame({
    'C': [7, 8, 9],
    'D': [10, 11, 12]
})

# Concatenate along rows (stack vertically)
concat_rows = pd.concat([df3, df4], axis=0)

# Concatenate along columns (side-by-side)
concat_columns = pd.concat([df3, df4], axis=1)

print("Concatenated Rows:\n", concat_rows)
print("\nConcatenated Columns:\n", concat_columns)

Concatenated Rows:
      A    B    C     D
0  1.0  4.0  NaN   NaN
1  2.0  5.0  NaN   NaN
2  3.0  6.0  NaN   NaN
0  NaN  NaN  7.0  10.0
1  NaN  NaN  8.0  11.0
2  NaN  NaN  9.0  12.0

Concatenated Columns:
    A  B  C   D
0  1  4  7  10
1  2  5  8  11
2  3  6  9  12


### Assignment 6: Time Series Analysis

1. Create a Pandas DataFrame with a datetime index and one column filled with random integers. Resample the DataFrame to compute the monthly mean of the values.
2. Create a Pandas DataFrame with a datetime index ranging from '2021-01-01' to '2021-12-31' and one column filled with random integers. Compute the rolling mean with a window of 7 days.

In [15]:
# Question 1.
# Create datetime index (daily range)
dates = pd.date_range(start='2021-01-01', periods=120, freq='D')

df = pd.DataFrame({
    'Value': np.random.randint(1, 100, size=120)
}, index=dates)

# Resample to monthly frequency and compute mean
monthly_mean = df.resample('M').mean()

print(monthly_mean)

                Value
2021-01-31  46.322581
2021-02-28  46.714286
2021-03-31  43.354839
2021-04-30  47.666667


  monthly_mean = df.resample('M').mean()


In [16]:
# Question 2.
# Create full-year daily datetime index
dates = pd.date_range(start='2021-01-01', end='2021-12-31', freq='D')

df2 = pd.DataFrame({
    'Value': np.random.randint(1, 100, size=len(dates))
}, index=dates)

# Rolling mean with a 7-day window
rolling_mean = df2.rolling(window=7).mean()

print(rolling_mean)

                Value
2021-01-01        NaN
2021-01-02        NaN
2021-01-03        NaN
2021-01-04        NaN
2021-01-05        NaN
...               ...
2021-12-27  65.142857
2021-12-28  60.857143
2021-12-29  55.285714
2021-12-30  41.285714
2021-12-31  44.000000

[365 rows x 1 columns]


### Assignment 7: MultiIndex DataFrame

1. Create a Pandas DataFrame with a MultiIndex (hierarchical index). Perform some basic indexing and slicing operations on the MultiIndex DataFrame.
2. Create a Pandas DataFrame with MultiIndex consisting of 'Category' and 'SubCategory'. Fill the DataFrame with random data and compute the sum of values for each 'Category' and 'SubCategory'.

In [17]:
# Question 1.
# Create MultiIndex
arrays = [
    ['A', 'A', 'A', 'B', 'B', 'C'],
    ['X', 'Y', 'Z', 'X', 'Y', 'X']
]

index = pd.MultiIndex.from_arrays(arrays, names=['Group', 'Label'])

# Create DataFrame
df = pd.DataFrame({'Value': np.random.randint(1, 100, size=6)}, index=index)

print("Full MultiIndex DataFrame:\n", df)

# Indexing single group
print("\nRows under Group 'A':\n", df.loc['A'])

# Indexing specific sub-level
print("\nValue for Group 'B' and Label 'Y':\n", df.loc[('B', 'Y')])

# Slicing across first level
print("\nSlice from Group A to B:\n", df.loc['A':'B'])

Full MultiIndex DataFrame:
              Value
Group Label       
A     X         44
      Y         96
      Z         49
B     X         55
      Y         50
C     X         18

Rows under Group 'A':
        Value
Label       
X         44
Y         96
Z         49

Value for Group 'B' and Label 'Y':
 Value    50
Name: (B, Y), dtype: int32

Slice from Group A to B:
              Value
Group Label       
A     X         44
      Y         96
      Z         49
B     X         55
      Y         50


In [18]:
# Question 2.
# Create MultiIndex
categories = ['Furniture', 'Furniture', 'Electronics', 'Electronics', 'Grocery', 'Grocery']
subcats = ['Chair', 'Table', 'Mobile', 'Laptop', 'Snacks', 'Beverages']

index = pd.MultiIndex.from_arrays([categories, subcats], names=['Category', 'SubCategory'])

# Create DataFrame with random data
df2 = pd.DataFrame({'Sales': np.random.randint(100, 1000, size=6)}, index=index)

print("Original DataFrame:\n", df2)

# Sum of values for each Category
sum_by_category = df2.groupby(level='Category').sum()
print("\nSum by Category:\n", sum_by_category)

# Sum for each SubCategory
sum_by_subcategory = df2.groupby(level='SubCategory').sum()
print("\nSum by SubCategory:\n", sum_by_subcategory)

Original DataFrame:
                          Sales
Category    SubCategory       
Furniture   Chair          685
            Table          651
Electronics Mobile         879
            Laptop         194
Grocery     Snacks         393
            Beverages      947

Sum by Category:
              Sales
Category          
Electronics   1073
Furniture     1336
Grocery       1340

Sum by SubCategory:
              Sales
SubCategory       
Beverages      947
Chair          685
Laptop         194
Mobile         879
Snacks         393
Table          651


### Assignment 8: Pivot Tables

1. Create a Pandas DataFrame with columns 'Date', 'Category', and 'Value'. Create a pivot table to compute the sum of 'Value' for each 'Category' by 'Date'.
2. Create a Pandas DataFrame with columns 'Year', 'Quarter', and 'Revenue'. Create a pivot table to compute the mean 'Revenue' for each 'Quarter' by 'Year'.

In [19]:
# Question 1.
# Create DataFrame
df = pd.DataFrame({
    'Date': pd.date_range(start='2021-01-01', periods=8, freq='D'),
    'Category': np.random.choice(['A', 'B', 'C'], size=8),
    'Value': np.random.randint(10, 100, size=8)
})

# Create Pivot Table: Sum of Value for each Category by Date
pivot_sum = pd.pivot_table(df, values='Value', index='Date', columns='Category', aggfunc='sum')

print(pivot_sum)

Category       A     B     C
Date                        
2021-01-01   NaN   NaN  71.0
2021-01-02  47.0   NaN   NaN
2021-01-03   NaN   NaN  32.0
2021-01-04   NaN  82.0   NaN
2021-01-05   NaN  47.0   NaN
2021-01-06   NaN   NaN  39.0
2021-01-07   NaN   NaN  40.0
2021-01-08  39.0   NaN   NaN


In [20]:
# Question 2.
# Create DataFrame
df2 = pd.DataFrame({
    'Year': np.random.choice([2020, 2021, 2022], size=12),
    'Quarter': np.random.choice(['Q1', 'Q2', 'Q3', 'Q4'], size=12),
    'Revenue': np.random.randint(1000, 10000, size=12)
})

# Create Pivot Table: Mean Revenue per Quarter for each Year
pivot_mean = pd.pivot_table(df2, values='Revenue', index='Year', columns='Quarter', aggfunc='mean')

print(pivot_mean)

Quarter      Q1      Q2      Q3      Q4
Year                                   
2020     3523.0     NaN     NaN     NaN
2021     5488.0  6244.0  3793.0  9901.0
2022     3613.0     NaN     NaN  4234.0


### Assignment 9: Applying Functions

1. Create a Pandas DataFrame with 3 columns and 5 rows filled with random integers. Apply a function that doubles the values of the DataFrame.
2. Create a Pandas DataFrame with 3 columns and 6 rows filled with random integers. Apply a lambda function to create a new column that is the sum of the existing columns.

In [21]:
# Question 1.
# Create DataFrame
df = pd.DataFrame(np.random.randint(1, 20, size=5*3).reshape(5, 3),
                  columns=['A', 'B', 'C'])

# Apply a function to double all values
df_doubled = df.apply(lambda x: x * 2)

print(df_doubled)

    A   B   C
0  14  34  20
1  12   4  24
2  32  34   6
3   4  32  28
4   6  38  18


In [22]:
# Question 2.
# Create DataFrame
df2 = pd.DataFrame(np.random.randint(1, 30, size=6*3).reshape(6, 3),
                   columns=['X', 'Y', 'Z'])

# New column = sum of existing columns using lambda
df2['Total'] = df2.apply(lambda row: row['X'] + row['Y'] + row['Z'], axis=1)

print(df2)

    X   Y   Z  Total
0  11  22  20     53
1  23  15  12     50
2  29   6  27     62
3  27  20  29     76
4  16   7  23     46
5   9  22   3     34


### Assignment 10: Working with Text Data

1. Create a Pandas Series with 5 random text strings. Convert all the strings to uppercase.
2. Create a Pandas Series with 5 random text strings. Extract the first three characters of each string.

In [23]:
# Question 1.
# Create Series with random text strings
s = pd.Series(['apple', 'banana', 'cherry', 'mango', 'orange'])

# Convert all strings to uppercase
s_upper = s.str.upper()

print(s_upper)

0     APPLE
1    BANANA
2    CHERRY
3     MANGO
4    ORANGE
dtype: object


In [24]:
# Question 2.
# Create Series with random text strings
s2 = pd.Series(['tiger', 'lion', 'zebra', 'eagle', 'panda'])

# Extract first three characters
s_first3 = s2.str[:3]

print(s_first3)

0    tig
1    lio
2    zeb
3    eag
4    pan
dtype: object
