# Pandas for Data Analytics 
### Importing 
import pandas as pd 

### Loading a dataset 
df = pd.read_csv("my_data.csv")
 
### General Data Info 
- df.info() -> Returns brief information about each column within dataframe 

- df.head() -> Returns the first 5 rows of the dataframe 

- df.tail() -> Returns the last 5 rows of the dataframe 

- df.describe() -> Returns generalizaed statistical indicators based on the data inside the dataframe 



# Data Filtering with Pandas 

### General Data Info 
df["column name"].min() : returns the minimum value 
df["column name"].max() : returns the maximum value 
df["column name"].mean() : returns the mean value 
df["column name"].median(): returns the median value of an attribute 

### Data Filtering 
We filter our data and only include data where aga is greater than 50 in this new data frame 

new_df = df[df["age"]> 50]

score = new_df ["score] or df[df["age"] > 20]["score"].mean()



# Grouping Data 

## Filtering vs Grouping 

### Filtering 
Search dataset for values that match a certain conditon 
1. New dataframe created with values from met condition 
2. Order does not change 
3. Data that doesn't meet the condition are not there 

## Grouping 
Create groups based on a characteristic 
1. Dataframe converted into a small one, contains general information 
2. Can add new series / change them 

Grouping : is a way to process our data in which the data is combined into groups based on one or more characterisitcs

![image.png](attachment:image.png)

### Pandas groupby() 
Allows us to group data by characteristics and calculate any statistics for grouped data 

grouped_df = df.groupby(by = 'column')['column']

***.reset_index() - Used to reset the index of a DataFrame, turning the index into a column and creating a new default integer index. 

![image-2.png](attachment:image-2.png)

grouped_df = df.groupby(by = 'Country')
avg_growth = grouped_df["Growth Rate"].mean()

print (avg_growth)

![image-3.png](attachment:image-3.png)

# Multiple Groupings 
![image-4.png](attachment:image-4.png)

grouped_df = df.groupby(by = ['Country','City'])

total_pop = grouped_df['Population (2024)'].sum()

print(total_pop)
![image-5.png](attachment:image-5.png)



# Aggreating Data in Pandas 

### Pandas Aggreate() 
Allows us to apply multiple functions to grouped data at once to calculate statistics. aggregate() or agg() 

.aggreate(["function1", "function2", "function3"]) or agg.(["function1", "function2", "function3"]) or .apply(custom_function_here) 

grouped_df = df.groupby(by = ['Country','City]) 
total_ pop = grouped_df['Population (2024')].agg(["min","max","mean"])
print(total_pop)

![image.png](attachment:image.png)

# Uncleaned Data 

### Issues with our data as they are not float or int64 but rather a combination 

![image.png](attachment:image.png)

## Issues with the data 
![image-2.png](attachment:image-2.png)

Theses should all be float64 not objects. This will allow us to apply filters and groups based on these conditions for better analysis 

![image-3.png](attachment:image-3.png)

We will learn to convert these working data types, for us this will be float64 (decimal numbers)

### Data Cleaning Methods 
isnull() : Check for Null values in the dataframes 
duplicated(): Check for any duplicate values 
dropna(): Drop any rows with missing values 
fillna(): Fill missing data (null) with a value 
drop_duplicates(): Drop any duplicate rows 
to_numeric(): Convert a series to a numeric type of data 
apply(): Apply a custom function to a series of data 
.shape : Returns the shape of the data ( # of rows/cols)

### Methods in Action 

print(data.isnull().sum()) 
print(data.duplicated().sum()) -> Initial check on our data to check the number of missing values and duplicates 

critical_columns = ["Cpu","Ram","Memory","Weight","Price]
data = data.dropna(subset = critical_columns) -> Drop any rows where these critical columns are missing data 

data = data.drop_duplicates() -> Drop all duplicate rows (if any)

data["Weight] = data["Weight].str.replace('kg',", regex = False)
data["Weight] = pd.to_numeric(data["Weight"], errors = 'coerce') -> Remove any strings from weight(kg) and convert the object into a float 

### Filling any missing data 

data["Weight"].fillna(data["Weight"].mean(), inplace = True)
data["Price"].fillna(data["Price"].mean(), inplace = True) -> Filling any missing any data with the average of that column 






# Intro to Data Visualization 

## Use Matplotlib 

import pandas as pd 
import matplotlib.pyplot as plt 

plot( x, y, kind = "line") -> Quickly construct diagrams 
show() -> Display the diagrams 

## Types of Charts 

![image.png](attachment:image.png)


## Pandas Plot in Action 

import matplotlib.pyplot as plt 
import pandas as pd 

df = pd.read_csv("airbnb.csv")

df["price"].plot()
plt.ylabel('Price')
plt.title('Line Plot of Price')
plt.show()

- df.plot(kind = 'scatter', x = 'price', y = 'number_of_reviews')
- plt.title('Scatter Plot of Price vs Number of Reviews')
- plt.show() 

## Create a Subplot 
![image-2.png](attachment:image-2.png)

import matplotlib.pyplot as plt 

fig, axs = plt.subplots (2,2, figsize = (12,10))

axs[0,0].hist(x, bins = 20, color = "lightblue")
axs[0,0].set_title("Price vs Frequency")
axs[0,0].set_xlabel("Price")
axs[0,0].set_ylabel("Frequency")

axs[0,1].scatter(x, y, color = "purple")

axs[1,0].bar(x, y, color = "green")

axs[1,1].line(x, y, color = "red")

![image-3.png](attachment:image-3.png)


