# GROUP BY in Python (using pandas)

In Python, **GROUP BY** is performed using the `groupby()` function from the **pandas** library.  
It is mainly used to **group data and apply aggregate functions**.

---

## Why groupby() is Used
- To summarize data
- To perform calculations on groups
- To analyze category-wise data

---

## Import Required Library
```python
import pandas as pd

In [1]:
import pandas as pd

In [6]:
data = pd.DataFrame({
    "Department": ["HR", "HR", "HR", "IT", "IT", "IT", "QA", "QA"],
    "Gender": ["M", "F", "M", "M", "F", "M", "F", "F"],
    "Salary": [30000, 32000, 35000, 50000, 52000, 55000, 40000, 42000],
    "Experience_Years": [2, 3, 5, 4, 6, 7, 3, 5]
})

df = pd.DataFrame(data)

print(df)

  Department Gender  Salary  Experience_Years
0         HR      M   30000                 2
1         HR      F   32000                 3
2         HR      M   35000                 5
3         IT      M   50000                 4
4         IT      F   52000                 6
5         IT      M   55000                 7
6         QA      F   40000                 3
7         QA      F   42000                 5


In [7]:
data

Unnamed: 0,Department,Gender,Salary,Experience_Years
0,HR,M,30000,2
1,HR,F,32000,3
2,HR,M,35000,5
3,IT,M,50000,4
4,IT,F,52000,6
5,IT,M,55000,7
6,QA,F,40000,3
7,QA,F,42000,5


In [9]:
data.groupby('Department').count()  #  Count rows per Department (most common)

Unnamed: 0_level_0,Gender,Salary,Experience_Years
Department,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
HR,3,3,3
IT,3,3,3
QA,2,2,2


In [10]:
data = pd.DataFrame({
    "Phase": [
        "Death Overs","Death Overs","Death Overs","Death Overs",
        "Middle Overs","Middle Overs","Middle Overs","Middle Overs",
        "Power Play","Power Play","Power Play","Power Play"
    ],
    "Outcome": [
        "Boundaries","Dots","Runs","Wicket",
        "Boundaries","Dots","Runs","Wicket",
        "Boundaries","Dots","Runs","Wicket"
    ],
    "Count": [6, 22, 26, 9, 2, 11, 17, 2, 6, 19, 13, 2]
})

In [15]:
result = data.groupby(["Phase", "Outcome"])["Count"].sum() # this is the method used to group thw data
result

Phase         Outcome   
Death Overs   Boundaries     6
              Dots          22
              Runs          26
              Wicket         9
Middle Overs  Boundaries     2
              Dots          11
              Runs          17
              Wicket         2
Power Play    Boundaries     6
              Dots          19
              Runs          13
              Wicket         2
Name: Count, dtype: int64

# Pandas date_range()
- pd.date_range() is used to create a list of dates automatically.
You tell pandas from when to when and how often, and it generates the dates for you.


- You want -- Use
- Between two dates  --- start + end
- Fixed number of dates -- periods
- Skip weekends -- freq='B'
- Monthly data --- freq='M'

- pd.date_range() helps you create date columns or indexes easily without typing dates manually.

In [17]:
# Create simple date data
dates = pd.date_range(start='2022-02-01', end='2022-02-07')
dates

DatetimeIndex(['2022-02-01', '2022-02-02', '2022-02-03', '2022-02-04',
               '2022-02-05', '2022-02-06', '2022-02-07'],
              dtype='datetime64[ns]', freq='D')

In [18]:
# Use date range inside a DataFrame
df = pd.DataFrame({
    "Date": pd.date_range(start='2022-02-01', end='2022-02-07'),
    "Sales": [10, 12, 15, 11, 9, 14, 16]
})

df

Unnamed: 0,Date,Sales
0,2022-02-01,10
1,2022-02-02,12
2,2022-02-03,15
3,2022-02-04,11
4,2022-02-05,9
5,2022-02-06,14
6,2022-02-07,16


In [35]:
# Using start and end 

pd.date_range(start='2022-02-01', end='2050-02-05')

DatetimeIndex(['2022-02-01', '2022-02-02', '2022-02-03', '2022-02-04',
               '2022-02-05', '2022-02-06', '2022-02-07', '2022-02-08',
               '2022-02-09', '2022-02-10',
               ...
               '2050-01-27', '2050-01-28', '2050-01-29', '2050-01-30',
               '2050-01-31', '2050-02-01', '2050-02-02', '2050-02-03',
               '2050-02-04', '2050-02-05'],
              dtype='datetime64[ns]', length=10232, freq='D')