✅ Day 3: Grouping, Time Series, and DateTime Handling in Pandas

# 📆 Day 3: Grouping and Working with DateTime in Pandas

Welcome to Day 3 of the 45-Day Data Science with AI Challenge!  
Today, we’ll cover 3 powerful topics:

1. 📊 Grouping Data using `groupby()`
2. 🕒 Introduction to Time Series
3. 🧮 Converting Strings to DateTime format

Let’s dive in!


🔹 Part 1: Grouping Data with groupby()

In [12]:
import pandas as pd

# Sample dataset: Date-wise product sales
data = {
    'Date': ['2023-01-01', '2023-01-01', '2023-01-02', '2023-01-02', '2023-01-03'],
    'Product': ['A', 'B', 'A', 'A', 'C'],
    'Sales': [100, 150, 120, 80, 200]
}

df = pd.DataFrame(data)
df


Unnamed: 0,Date,Product,Sales
0,2023-01-01,A,100
1,2023-01-01,B,150
2,2023-01-02,A,120
3,2023-01-02,A,80
4,2023-01-03,C,200


We have a simple dataset of product sales over a few days.
Now, let’s summarize total sales **per day**.


In [15]:
# Group by date and sum sales
grouped = df.groupby('Date')['Sales'].sum()
print(grouped)


Date
2023-01-01    250
2023-01-02    200
2023-01-03    200
Name: Sales, dtype: int64


### 📌 Group by Multiple Columns (Date + Product)


In [19]:
# Group by date and product to get sum and max sales
df.groupby(['Date', 'Product'])['Sales'].agg(['sum', 'max'])


Unnamed: 0_level_0,Unnamed: 1_level_0,sum,max
Date,Product,Unnamed: 2_level_1,Unnamed: 3_level_1
2023-01-01,A,100,100
2023-01-01,B,150,150
2023-01-02,A,200,120
2023-01-03,C,200,200


💼 Example: Grouping a Portfolio
python
Copy code


In [26]:
# A sample portfolio of stocks
data = {
    'Sector': ['IT', 'FMCG', 'Finance', 'Pharma', 'Pharma',
               'FMCG', 'FMCG', 'IT', 'Finance', 'Real Estate'],
    'Company': ['TCS', 'HINDUNILVR', 'HDFCBANK', 'Sun Pharma', 'Lupin',
                'Adani Wilmar', 'Britianna', 'Persistent Systems', 'Bajaj Finance', 'DLF'],
    'MarketCap': ['Large Cap', 'Large Cap', 'Mid Cap', 'Mid Cap', 'Mid Cap',
                  'Small Cap', 'Mid Cap', 'Small Cap', 'Large Cap', 'Mid Cap'],
    'Share Price': [3250, 2100, 1500, 930, 700, 230, 5231, 750, 1887, 239],
    'Amount Invested': [500000, 43000, 20000, 34000, 55000, 15000, 42000, 19000, 5500, 9000]
}

portfolio = pd.DataFrame(data)
portfolio


Unnamed: 0,Sector,Company,MarketCap,Share Price,Amount Invested
0,IT,TCS,Large Cap,3250,500000
1,FMCG,HINDUNILVR,Large Cap,2100,43000
2,Finance,HDFCBANK,Mid Cap,1500,20000
3,Pharma,Sun Pharma,Mid Cap,930,34000
4,Pharma,Lupin,Mid Cap,700,55000
5,FMCG,Adani Wilmar,Small Cap,230,15000
6,FMCG,Britianna,Mid Cap,5231,42000
7,IT,Persistent Systems,Small Cap,750,19000
8,Finance,Bajaj Finance,Large Cap,1887,5500
9,Real Estate,DLF,Mid Cap,239,9000


In [29]:
# Group by Sector
portfolio.groupby('Sector').groups

{'FMCG': [1, 5, 6], 'Finance': [2, 8], 'IT': [0, 7], 'Pharma': [3, 4], 'Real Estate': [9]}

In [31]:
# Loop through groups by Sector
for name, group in portfolio.groupby('Sector'):
    print(f"{name}\n{group}\n")


FMCG
  Sector       Company  MarketCap  Share Price  Amount Invested
1   FMCG    HINDUNILVR  Large Cap         2100            43000
5   FMCG  Adani Wilmar  Small Cap          230            15000
6   FMCG     Britianna    Mid Cap         5231            42000

Finance
    Sector        Company  MarketCap  Share Price  Amount Invested
2  Finance       HDFCBANK    Mid Cap         1500            20000
8  Finance  Bajaj Finance  Large Cap         1887             5500

IT
  Sector             Company  MarketCap  Share Price  Amount Invested
0     IT                 TCS  Large Cap         3250           500000
7     IT  Persistent Systems  Small Cap          750            19000

Pharma
   Sector     Company MarketCap  Share Price  Amount Invested
3  Pharma  Sun Pharma   Mid Cap          930            34000
4  Pharma       Lupin   Mid Cap          700            55000

Real Estate
        Sector Company MarketCap  Share Price  Amount Invested
9  Real Estate     DLF   Mid Cap          239

🔹 Part 2: Time Series Basics

In [37]:
# A sample time series data
time_data = {
    'Timestamp': ['2023-01-01 10:00', '2023-01-01 10:01', '2023-01-01 10:02'],
    'Value': [100, 110, 105]
}

ts_df = pd.DataFrame(time_data)
ts_df


Unnamed: 0,Timestamp,Value
0,2023-01-01 10:00,100
1,2023-01-01 10:01,110
2,2023-01-01 10:02,105


Right now, 'Timestamp' is just a string.  
To do time-based analysis, we need to convert it to a proper DateTime format.


🕒 Part 3: Convert String to DateTime

In [41]:
# Convert 'Timestamp' column to datetime format
ts_df['Timestamp'] = pd.to_datetime(ts_df['Timestamp'])

# Now we can access parts like hour, minute, etc.
ts_df['Hour'] = ts_df['Timestamp'].dt.hour
ts_df['Minute'] = ts_df['Timestamp'].dt.minute

ts_df


Unnamed: 0,Timestamp,Value,Hour,Minute
0,2023-01-01 10:00:00,100,10,0
1,2023-01-01 10:01:00,110,10,1
2,2023-01-01 10:02:00,105,10,2


✅ Summary


🔹groupby() is used to group data by categories and perform calculations

🔹Time Series is all about working with time-stamped data

🔹Use pd.to_datetime() to convert strings into datetime objects

🎉 That’s it for Day 3!

You’ve learned:
- How to group data
- Basics of time series
- Converting and working with datetime columns

📢 Stay tuned for more and share your journey!
#45DaysOfDataScience #Python #Pandas #TimeSeries #AIChallenge
