Here's a Python task idea for today:

**Task**: Working with Dictionaries and Lists in Python

**Objective**: 
You will create a Python program that processes a list of student dictionaries, each containing information about a student's name, age, grades, and participation level. The goal is to calculate the average grade of all students, filter students who are above or below the average grade, and sort the filtered list by their participation level.

### Steps:
1. Create a list of dictionaries where each dictionary represents a student. Each dictionary should contain:
   - 'name' (string): The student's name.
   - 'age' (int): The student's age.
   - 'grades' (list of ints): A list of grades (e.g., [78, 85, 90]).
   - 'participation' (string): Level of participation (e.g., 'high', 'medium', 'low').

2. Write a function to calculate each student's average grade.

3. Calculate the overall average grade of all students.

4. Write a function to filter students who are above or below the overall average grade.

5. Sort the filtered list by their participation level ('high' > 'medium' > 'low').

6. Print the list of students in each filtered category (above and below the average) with their names, average grades, and participation level.

### Example:
```python
students = [
    {'name': 'John', 'age': 20, 'grades': [85, 90, 82], 'participation': 'high'},
    {'name': 'Alice', 'age': 22, 'grades': [70, 75, 72], 'participation': 'medium'},
    {'name': 'Bob', 'age': 19, 'grades': [60, 68, 65], 'participation': 'low'},
    {'name': 'Emma', 'age': 21, 'grades': [92, 88, 95], 'participation': 'high'},
]

# Steps 2-6 implemented here
```

This will help you practice using dictionaries, lists, and basic data manipulation in Python.

In [2]:
students = [
    {'name': 'John', 'age': 20, 'grades': [85, 90, 82], 'participation': 'high'},
    {'name': 'Alice', 'age': 22, 'grades': [70, 75, 72], 'participation': 'medium'},
    {'name': 'Bob', 'age': 19, 'grades': [60, 68, 65], 'participation': 'low'},
    {'name': 'Emma', 'age': 21, 'grades': [92, 88, 95], 'participation': 'high'},
]

In [21]:
# Loop through each student in the list and print their name and average grade
for student in students:
    # Calculate and print the average grade for each student
    print(f"{student['name']} has an average grade of {sum(student['grades'])/len(student['grades']):.2f}")

John has an average grade of 85.67
Alice has an average grade of 72.33
Bob has an average grade of 64.33
Emma has an average grade of 91.67


In [22]:
# Calculate the overall average grade of all students
avg_grade = 0
num_students = 0

for student in students:
    # Add each student's average grade to the total
    avg_grade += sum(student['grades']) / len(student['grades'])
    num_students += 1

# Calculate the overall average by dividing the total by the number of students
avg_grade /= num_students

# Print the overall average grade, formatted to 2 decimal places
print(f'The average student grade is {avg_grade:.2f}')

The average student grade is 78.50


In [23]:
# Function to filter students who are above or below the overall average grade
def compare_average_grade(students, grade):
    for student in students:
        # Calculate each student's average grade
        avg_grade = sum(student['grades']) / len(student['grades'])
        
        # Compare the student's average grade with the overall average
        if avg_grade > grade:
            print(f"{student['name']} has an above average grade")
        elif avg_grade < grade:
            print(f"{student['name']} has a below average grade")
        else:
            print(f"{student['name']} grade is the average grade")

# Call the function and pass the list of students and the overall average grade
compare_average_grade(students, avg_grade)

John has an above average grade
Alice has a below average grade
Bob has a below average grade
Emma has an above average grade


Sort the filtered list by their participation level ('high' > 'medium' > 'low').

Print the list of students in each filtered category (above and below the average) with their names, average grades, and participation level.

In [45]:
# Function to compare average grades and sort by participation level
def compare_and_sort_students(students, grade):
    above_average = []
    below_average = []

    for student in students:
        avg_grade = sum(student['grades']) / len(student['grades'])
        
        if avg_grade > grade:
            above_average.append(student)
        elif avg_grade < grade:
            below_average.append(student)

    # Assign values for sorting based on participation level
    participation_level = {'high': 3, 'medium': 2, 'low': 1}

    # Sort the above_average and below_average lists by participation level
    above_average.sort(key=lambda x: participation_level[x['participation']], reverse=True)
    below_average.sort(key=lambda x: participation_level[x['participation']], reverse=True)

    # Print the results
    print("Students with above average grades:")
    for student in above_average:
        print(f"{student['name']} with participation level {student['participation']}")

    print("\nStudents with below average grades:")
    for student in below_average:
        print(f"{student['name']} with participation level {student['participation']}")

# Call the function
compare_and_sort_students(students, avg_grade)

Students with above average grades:
John with participation level high
Emma with participation level high

Students with below average grades:
Alice with participation level medium
Bob with participation level low


In [46]:
messy_data = {
    'StudentID': [101, 102, 103, None, 105, 106, 107, 108],
    'Name': ['Alice', 'Bob', 'CHARLIE', 'Diana', 'eve', 'FRED', None, 'George123'],
    'Age': ['20', '21', '19', '20', '22', 'twenty', '23', 'N/A'],
    'Grades': ['A', 'B', 'C', 'C', 'B', 'A', 'D', 'E'],
    'Participation': ['high', 'medium', 'medium', 'low', 'High', 'high', None, 'Medium'],
    'GPA': ['3.5', '4.0', None, '2.8', '3.2', 'three point five', '2.9', 'N/A'],
    'Extracurriculars': ['Drama, Sports', 'Sports', None, 'Art', 'Drama, Music', 'Sports', 'Music, Drama', 'Art, Sports'],
    'EnrollmentDate': ['2020-09-01', '2021/01/15', '2021-09-10', '2020-10-01', None, '2020-09-20', '2021-08-30', '2020-01-01']
}


In [218]:
import pandas as pd
from word2number import w2n
import re
import numpy as np
import warnings
warnings.simplefilter("ignore")

In [223]:
def standardize_date(date):
    try:
        return pd.to_datetime(date, errors='coerce')  # Convert to datetime, coercing errors to NaT
    except Exception as e:
        return None  # Return None if an error occurs

def convert_to_number(text):
    if text is None:
        return None
    elif isinstance(text, str) and text.isnumeric():
        if float(text).is_integer():
            return int(text)
        else:
            return float(text)
    else:
        try:
            return w2n.word_to_num(text)  # Convert word representation to number
        except ValueError:
            return text

def wrangle(data):
    # Convert the input data to a DataFrame
    df = pd.DataFrame(data)

    # Standardize the 'Name' column: Convert names to title case
    df["Name"] = df["Name"].str.title()
    
    # Clean the 'Participation' column: Capitalize the first letter if it's a string
    df["Participation"] = df["Participation"].apply(lambda x: x.capitalize() if isinstance(x, str) else x)

    # Handle missing or 'N/A' values in the 'GPA' column and convert text to numbers
    df["GPA"] = df["GPA"].apply(lambda x: x if x is not None and x != 'N/A' else None).apply(convert_to_number)
    
    # Handle missing or 'N/A' values in the 'Age' column and convert text to numbers
    df["Age"] = df["Age"].apply(lambda x: x if x is not None and x != 'N/A' else None).apply(convert_to_number)

    # Clean the 'Name' column: Remove any non-alphabet characters
    df["Name"] = df["Name"].apply(lambda x: re.sub(r'[^a-zA-Z]', '', x) if x else x)
    #df["Name"] = df["Name"].apply(lambda x: x if x is None else 
                    #(x if not re.findall(r'\d+', x) else ''.join(filter(lambda alphabet: alphabet.isalpha(), x))))

    # Standardize the 'EnrollmentDate' column: Convert various date formats to a standard datetime format
    df['EnrollmentDate'] = df['EnrollmentDate'].apply(standardize_date)

    # Handle missing values in 'StudentID' and 'Age'
    df["StudentID"] = df["StudentID"].fillna(-1).astype(int)
    df["Age"] = df["Age"].fillna(-1).astype(int)

    #df["StudentID"].replace(-1, np.nan, inplace=True)
    #df["Age"].replace(-1, np.nan, inplace=True)

    df = df.fillna(value=np.nan)
    return df

df = wrangle(messy_data)
df.head(10)

Unnamed: 0,StudentID,Name,Age,Grades,Participation,GPA,Extracurriculars,EnrollmentDate
0,101,Alice,20,A,High,3.5,"Drama, Sports",2020-09-01
1,102,Bob,21,B,Medium,4.0,Sports,2021-01-15
2,103,Charlie,19,C,Medium,,,2021-09-10
3,-1,Diana,20,C,Low,2.8,Art,2020-10-01
4,105,Eve,22,B,High,3.2,"Drama, Music",NaT
5,106,Fred,20,A,High,3.5,Sports,2020-09-20
6,107,,23,D,,2.9,"Music, Drama",2021-08-30
7,108,George,-1,E,Medium,,"Art, Sports",2020-01-01


In [220]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8 entries, 0 to 7
Data columns (total 8 columns):
 #   Column            Non-Null Count  Dtype         
---  ------            --------------  -----         
 0   StudentID         7 non-null      float64       
 1   Name              7 non-null      object        
 2   Age               7 non-null      float64       
 3   Grades            8 non-null      object        
 4   Participation     7 non-null      object        
 5   GPA               6 non-null      object        
 6   Extracurriculars  7 non-null      object        
 7   EnrollmentDate    7 non-null      datetime64[ns]
dtypes: datetime64[ns](1), float64(2), object(5)
memory usage: 644.0+ bytes
