# Cleaning Data/ Advanced Pandas
If you want to type along with me, use [this notebook](https://humboldt.cloudbank.2i2c.cloud/hub/user-redirect/git-pull?repo=https%3A%2F%2Fgithub.com%2Fbethanyj0%2Fdata271_sp24&branch=main&urlpath=tree%2Fdata271_sp24%2Fdemos%2Fdata271_demo29_live.ipynb) instead. 
If you don't want to type and want to follow along just by executing the cells, stay in this notebook. 

In [1]:
import numpy as np
import pandas as pd

### Reshaping data

In [5]:
df_weather_wide = pd.read_csv('sample_weather.csv')
df_weather_wide = df_weather_wide.iloc[:,1:]
df_weather_wide

Unnamed: 0,date,inches_of_rain,max_temp,min_temp
0,2024-01-01,0.5,60,40
1,2024-01-02,0.0,55,45
2,2024-01-03,0.1,52,42
3,2022-01-04,0.0,56,48


In [6]:
# transpose the data
df_weather_wide.T

Unnamed: 0,0,1,2,3
date,2024-01-01,2024-01-02,2024-01-03,2022-01-04
inches_of_rain,0.5,0.0,0.1,0.0
max_temp,60,55,52,56
min_temp,40,45,42,48


In [None]:
# transpose with more informative columns
df_weather_wide.set_index('date').T

In [7]:
# change wide format data into long format
long_df = df_weather_wide.melt(id_vars = 'date',value_vars = ['max_temp','min_temp','inches_of_rain'])
long_df

Unnamed: 0,date,variable,value
0,2024-01-01,max_temp,60.0
1,2024-01-02,max_temp,55.0
2,2024-01-03,max_temp,52.0
3,2022-01-04,max_temp,56.0
4,2024-01-01,min_temp,40.0
5,2024-01-02,min_temp,45.0
6,2024-01-03,min_temp,42.0
7,2022-01-04,min_temp,48.0
8,2024-01-01,inches_of_rain,0.5
9,2024-01-02,inches_of_rain,0.0


In [None]:
# change long format back into wide format
long_df.pivot(index = 'date',columns = 'variable',values='value')

### What do when there are multiple values in categories 

In [None]:
long_df = pd.read_csv('long_data.csv')
long_df = long_df.iloc[:,1:]
long_df.head()

In [None]:
# Pivot the data to get average sales by date and category
long_df.pivot_table(index=['date'], columns='category', values=['sales'])

In [None]:
# Pivot the data to get TOTAL sales by date and category
wide_df = long_df.pivot_table(index=['date'], columns='category', values=['sales'], aggfunc=sum)
wide_df

In [None]:
# Pivot the data to get TOTAL sales by date, product, and category
long_df.pivot_table(index='date', columns=['category','product'], values=['sales'], aggfunc=sum)

In [None]:
# Go from wide to long
wide_df.reset_index().melt(id_vars='date', var_name=['type','category'])

## Activity

In [None]:

# Create a DataFrame with data cleaning and reshaping opportunities
data = {
    'Pet Name': ['Fluffy', 'Whiskers', 'Bubbles', 'Spike', 'Coco', 'Maybelle', 'Snowball'],
    'Date Adopted': ['10-01-2023','03-04-2024','01-10-2024','02-14-2024','11-22-2023','01-04-2024','12-25-2025'],
    'Animal Type': ['Cat', 'Cat', 'Fish', 'Dog', 'Fish', 'Dog', 'Cat'],
    'Pet Age': ['3', '2', '13', '5', '4', '3', '2'],
    'Color': ['White', 'Gray', 'Orange', 'White', 'White', 'Black', 'Black'],
    'Happiness Level': ['High', 'Medium', 'High', 'Low', 'High', 'High', 'Medium']
}
df_pets = pd.DataFrame(data)
df_pets

**Activity 1:** Rename the columns of the pets dataframe to be in a better format.

**Activity 2:** Change any datatypes that should be adjusted.  

**Activity 3:** Practice pivoting the dataframe.