## Dictionaries and Pandas

#### Dictionaries
- Motivation for Dictionaries:
Dictionaries are an essential data structure in Python, offering a way to store data in key-value pairs. Unlike lists, which use indexed positions (0, 1, 2...) to store values, dictionaries allow you to use descriptive keys to retrieve values. They are extremely useful when you need to associate a piece of data with a label or unique identifier.
- Create Dictionary:
A dictionary in Python is created using curly braces {}, with key-value pairs separated by a colon.

In [2]:
# Create Dictionary
student = {"name": "John", "age": 21, "major": "Computer Science"}

In the above example:  
- "name", "age", and "major" are the keys.
- "John", 21, and "Computer Science" are the values associated with these keys.

1. Access Dictionary:  
To access a value in the dictionary, you use the key:

In [3]:
print('Name:',student["name"])  # Expect output: John
print('Age:',student["age"])   # Expect output: 21

Name: John
Age: 21


You can also use the .get() method to access a value, which avoids errors if the key is not found:

In [4]:
print(student.get("name"))  # Output: John

John


2.  Dictionary Manipulation: Add/Update/Delete:  
You can add new key-value pairs or update existing ones by directly assigning a value to a key:

In [5]:
# Adding new key-value pair, when inputting a key that doesn't exist yet
student["grade"] = "A"

# Updating existing key-value pair, when assigning value to a key that already has an associated value
student["age"] = 22

# Deleting key-value pair
del student["major"]

3. Dictionary Manipulation: Iterating through Dictionaries:  
You can loop through dictionaries to access both keys and values:

In [6]:
for key, value in student.items():
    print(f"{key}: {value}")

name: John
age: 22
grade: A


5. Dictionariception (Nested Dictionaries):  
Dictionaries can also contain other dictionaries (nested dictionaries), which is useful for organizing more complex data.

In [7]:
students = {
    "student1": {"name": "John", "age": 21},
    "student2": {"name": "Alice", "age": 22}
}

print(students["student1"]["name"])  # Output: John

John


#### Pandas 
##### Introduction to DataFrames
1. Dictionary to DataFrame:  
A DataFrame is a table-like structure in Pandas, which is built on top of Python dictionaries. You can convert a dictionary to a DataFrame as follows:

In [8]:
import pandas as pd

data = {
    "name": ["John", "Alice", "Bob"],
    "age": [21, 22, 23],
    "major": ["CS", "Math", "Biology"]
}

df = pd.DataFrame(data)
print(df)

    name  age    major
0   John   21       CS
1  Alice   22     Math
2    Bob   23  Biology


You can also create a DataFrame from a list of dictionaries, where each dictionary represents a row:

In [9]:
data = [
    {"name": "John", "age": 21, "major": "CS"},
    {"name": "Alice", "age": 22, "major": "Math"},
    {"name": "Bob", "age": 23, "major": "Biology"}
]

df = pd.DataFrame(data)
print(df)

    name  age    major
0   John   21       CS
1  Alice   22     Math
2    Bob   23  Biology


2. CSV to DataFrame:  
You can also read data from CSV files directly into a Pandas DataFrame:

In [10]:
df = pd.read_csv(r"C:\Users\Renz\.cache\kagglehub\datasets\abubakkar123\walmart-stores-weekly-sales-forecasting\versions\1\Walmart.csv")
print(df.head())
print(df.tail())

   Store        Date  Weekly_Sales  Holiday_Flag  Temperature  Fuel_Price  \
0      1  05-02-2010    1643690.90             0        42.31       2.572   
1      1  12-02-2010    1641957.44             1        38.51       2.548   
2      1  19-02-2010    1611968.17             0        39.93       2.514   
3      1  26-02-2010    1409727.59             0        46.63       2.561   
4      1  05-03-2010    1554806.68             0        46.50       2.625   

          CPI  Unemployment  
0  211.096358         8.106  
1  211.242170         8.106  
2  211.289143         8.106  
3  211.319643         8.106  
4  211.350143         8.106  
      Store        Date  Weekly_Sales  Holiday_Flag  Temperature  Fuel_Price  \
6430     45  28-09-2012     713173.95             0        64.88       3.997   
6431     45  05-10-2012     733455.07             0        64.89       3.985   
6432     45  12-10-2012     734464.36             0        54.47       4.000   
6433     45  19-10-2012     718125.53

#### DataFrame Manipulation
1. Square Brackets: Accessing Columns:  
You can access individual columns using square brackets:

In [11]:
data = [
    {"name": "John", "age": 21, "major": "CS"},
    {"name": "Alice", "age": 22, "major": "Math"},
    {"name": "Bob", "age": 23, "major": "Biology"}
]

df = pd.DataFrame(data)
print(df["name"])  # Output: Series containing names

0     John
1    Alice
2      Bob
Name: name, dtype: object


2. Square Brackets: Accessing Rows:  
To access rows by index, you can use .iloc[] (for index-based access) or .loc[] (for label-based access):

In [12]:
print(df.iloc[0])  # Access the first row
print(df.loc[0])  # Access the row with index 0 (same as iloc for this case)

name     John
age        21
major      CS
Name: 0, dtype: object
name     John
age        21
major      CS
Name: 0, dtype: object


3. loc and iloc: Label-based and Position-based Indexing:
loc[] is label-based, meaning you use the row label (index) to access the data:

In [15]:
print(df.loc[0]) # First row of all columns
print(df.loc[:,'name']) # All rows of column 'name'

name     John
age        21
major      CS
Name: 0, dtype: object
0     John
1    Alice
2      Bob
Name: name, dtype: object


iloc[] is position-based, meaning you use the integer index to access the data:

In [16]:
print(df.iloc[0])  # First row of all columns
print(df.iloc[:,0]) # All rows of the first column

name     John
age        21
major      CS
Name: 0, dtype: object
0     John
1    Alice
2      Bob
Name: name, dtype: object


4. loc and iloc: Slicing Rows and Columns:
You can slice both rows and columns with .loc[] and .iloc[]:

In [17]:
# Using loc for slicing by label
print(df.loc[0:1, ["name", "age"]])  # First two rows, 'name' and 'age' columns

# Using iloc for slicing by position
print(df.iloc[0:1, [0, 1]])  # First row, 'name' and 'age' columns

    name  age
0   John   21
1  Alice   22
   name  age
0  John   21


5. loc and iloc: Filtering DataFrames:
You can filter DataFrames based on conditions:

In [None]:
# Using loc to filter by condition
print(df.loc[df["age"] > 21]) # Returns all the columns of those over 21 years old
print(df.loc[df["age"] > 21, ["name","age"]]) # Returns only the name and age of those over 21 years old

# iloc, on the other hand, doesn't directly handle boolean conditions
print(df.iloc[0:2])


    name  age    major
1  Alice   22     Math
2    Bob   23  Biology
    name  age
1  Alice   22
2    Bob   23
    name  age major
0   John   21    CS
1  Alice   22  Math


#### Practice Project: Working with Data
Objective:  
The goal of this project is to create and manipulate a DataFrame using Python dictionaries, and practice your ability to access and filter data.  

Dataset: Create a dataset with the following columns: "name", "age", "major", "GPA".  

Steps to Follow:
1. Create a dictionary for the dataset: Create a dictionary with lists for each column
2. Convert the dictionary to a Pandas DataFrame: Use pd.DataFrame(data) to convert the dictionary to a DataFrame.
3. Perform DataFrame operations:
- Access a single column (e.g., "name").
- Access multiple columns (e.g., "name" and "GPA").
- Filter rows where GPA is greater than 3.5.
- Add a new column that contains the graduation year (assuming students graduate in 4 years).
- Use loc[] and iloc[] to access specific rows and columns.

In [39]:
ds = {"name":["Barb", "Ton", "Jose", "Melv", "Fred", "Mika"],
      "age":[19, 22, 20, 18, 23, 24],
      "major":["Biology", "Physics","Aviation","Accountancy","Economics","Political Science"],
      "GPA":[2.9, 3.9, 3.7, 3.2, 3.8, 3.5]}
daf = pd.DataFrame(ds)
print(daf.loc[0,"name"])
print(daf.loc[:,["name","GPA"]])
print(daf.loc[daf["GPA"] > 3.5])
daf["Grad Yr."] = [2027, 2025, 2026, 2028, 2024, 2023]
print(daf.loc[daf["Grad Yr."]<2026,["name","GPA"]])
print(daf.iloc[:,-1])

Barb
   name  GPA
0  Barb  2.9
1   Ton  3.9
2  Jose  3.7
3  Melv  3.2
4  Fred  3.8
5  Mika  3.5
   name  age      major  GPA
1   Ton   22    Physics  3.9
2  Jose   20   Aviation  3.7
4  Fred   23  Economics  3.8
   name  GPA
1   Ton  3.9
4  Fred  3.8
5  Mika  3.5
0    2027
1    2025
2    2026
3    2028
4    2024
5    2023
Name: Grad Yr., dtype: int64


#### Bonus Problem:
- Create a new column in the DataFrame that classifies students into categories based on their GPA: "Excellent" (GPA >= 3.8), "Good" (GPA between 3.0 and 3.7), and "Needs Improvement" (GPA < 3.0).
- Then, group the students by this new category and calculate the average age and GPA for each category.

In [40]:
daf['Remarks'] = daf['GPA'].apply(lambda gpa: 'Excellent' if gpa >= 3.8
                                  else ('Good' if gpa >= 3.0 
                                  else 'Needs Improvement'))
print(daf)
print("\nExcellent Students\n", daf.loc[daf["Remarks"] == "Excellent", daf.columns != "Remarks"])
print("\nGood Students\n", daf.loc[daf["Remarks"] == "Good", daf.columns != "Remarks"])
print("\nStudents Who Needs Improvement\n", daf.loc[daf["Remarks"] == "Needs Improvement", daf.columns != "Remarks"])

   name  age              major  GPA  Grad Yr.            Remarks
0  Barb   19            Biology  2.9      2027  Needs Improvement
1   Ton   22            Physics  3.9      2025          Excellent
2  Jose   20           Aviation  3.7      2026               Good
3  Melv   18        Accountancy  3.2      2028               Good
4  Fred   23          Economics  3.8      2024          Excellent
5  Mika   24  Political Science  3.5      2023               Good

Excellent Students
    name  age      major  GPA  Grad Yr.
1   Ton   22    Physics  3.9      2025
4  Fred   23  Economics  3.8      2024

Good Students
    name  age              major  GPA  Grad Yr.
2  Jose   20           Aviation  3.7      2026
3  Melv   18        Accountancy  3.2      2028
5  Mika   24  Political Science  3.5      2023

Students Who Needs Improvement
    name  age    major  GPA  Grad Yr.
0  Barb   19  Biology  2.9      2027
