In this notebook, I've covered most used Pandas Dataframe assertion functions. Below are these functions:

1. `assert_pd_dataframe_variable_column_equals_csv()`
2. `assert_pd_dataframe_variable_equals_csv()`
3. `assert_pd_dataframe_variable_equals_variable()`
4. `assert_pd_dataframe_csv_equals_csv()`

In [2]:
exec(open("utils.py").read())

### Import the libraries and load the dataset

In [3]:
import pandas as pd

df = pd.read_csv('Best_Books_Ever.csv')

### Activities

##### Activity 1. Calculating the Price-to-Rating Ratio

Create a new column `Price-to-Rating Ratio` in the DataFrame that calculates the price-to-rating ratio for each book. This ratio will help us understand how the price of a book relates to its average rating.

Solution:

In [8]:
df['price_to_rating'] = df['price'] / df['rating']

In [None]:
# save the dataframe to a new csv file
df.to_csv('activity_solutions_files/sol_01.csv', index=True)

Assertions:

In [9]:
assert_pd_dataframe_variable_column_equals_csv('df', 'price_to_rating', 'sol_01.csv')

##### Activity 2. Remove the "isbn" Column

The "isbn" column is not needed for our analysis. Write a script to remove this column from the dataframe.

Solution:

In [10]:
df.drop(columns='isbn', inplace=True)

In [11]:
# save the dataframe to a new csv file
df.to_csv('activity_solutions_files/sol_02.csv', index=True)

Assertions:

In [12]:
assert_pd_dataframe_variable_equals_csv('df', 'sol_02.csv')

##### Activity 3. Remove the Rows with Missing Values

Write a script to extract the publication year from the `publishDate` column and create a new column named `YearPublished` in the dataframe.

Solution:

In [87]:
df['YearPublished'] = df['publishDate'].str.extract(r'(\d{4})')

In [None]:
# save the dataframe to a new csv file
df.to_csv('activity_solutions_files/sol_03.csv', index=True)

Assertions:

In [None]:
read_csv_kwargs = {'index_col': 0, 'dtype': {'YearPublished': 'int32'}}
assert_pd_dataframe_variable_column_equals_csv('df', 'YearPublished', 'sol_03.csv', read_csv_kwargs=read_csv_kwargs)

##### Activity 4. Filter Books with Ratings Above 4.5

Create a new dataframe that only include books with ratings above 4.5. Name this new dataframe `best_books`.

Solution:

In [88]:
best_books = df[df['rating'] >= 4.5]

In [None]:
# save the dataframe to a new csv file
best_books.to_csv('activity_solutions_files/sol_04.csv', index=True)

Assertions:

In [None]:
read_csv_kwargs = {'index_col': 0, 'dtype': {'YearPublished': 'int32'}}
assert_pd_dataframe_variable_equals_csv('best_books', 'sol_04.csv', read_csv_kwargs=read_csv_kwargs)

##### Activity 5. Drop Books with Fewer than 100 Pages

Some entries in the dataset might represent short stories or other short works. For this activity, remove all rows from the dataframe where the number of pages is less than 100.

Solution:

In [93]:
df = df[df['pages'] >= 100]

This line of code removes all rows from the dataframe where the number of pages is less than 100.

In [None]:
# save the dataframe to a new csv file
df.to_csv('activity_solutions_files/sol_05.csv', index=True)

Assertions:

In [None]:
read_csv_kwargs = {'index_col': 0, 'dtype': {'YearPublished': 'int32'}}
assert_pd_dataframe_variable_equals_csv('df', 'sol_05.csv', read_csv_kwargs=read_csv_kwargs)

##### Activity 6. Flag Books with multiple Awards

Create a new column `MultipleAwards` that flags books that have won multiple awards. If a book has won more than one award, set the value of `MultipleAwards` to `True`; otherwise, set it to `False`.

Solution:

In [95]:
df['MultipleAwards'] = df['awards'].apply(lambda x: len(eval(x)) > 1)

In [None]:
# save the dataframe to a new csv file
df.to_csv('activity_solutions_files/sol_06.csv', index=True)

Assertions:

In [None]:
read_csv_kwargs = {'index_col': 0}
assert_pd_dataframe_variable_column_equals_csv('df', 'MultipleAwards', 'sol_06.csv', read_csv_kwargs=read_csv_kwargs)

##### Activity 7. Adding a New Book Entry

Add a new book entry to the dataframe with the following details:

```python
new_boos = {
    "bookID": 10000,
    "title": "The Great Gatsby",
    "author": "F. Scott Fitzgerald",
    "rating": 3.9,
    "pages": 180,
    "publishDate": "2003-09-30",
    "publisher": "Scribner",
    "price": 7.99,
    "genres": "['Fiction', 'Classics']",
    "YearPublished": 2003,
    "GenreCount": 2,
    "FirstName": "F.",
    "LastName": "Fitzgerald",
    "PrimaryGenre": "Fiction",
    "MultipleAwards": False,
    "ReadingTimeHours": 9.0,
    "Published21stCentury": True
}
```

> Add this new entry to the index `len(df)`.

Solution:

In [104]:
new_book = {
    "bookID": 10000,
    "title": "The Great Gatsby",
    "author": "F. Scott Fitzgerald",
    "rating": 3.9,
    "pages": 180,
    "publishDate": "2003-09-30",
    "publisher": "Scribner",
    "price": 7.99,
    "genres": "['Fiction', 'Classics']",
    "YearPublished": 2003,
    "GenreCount": 2,
    "FirstName": "F.",
    "LastName": "Fitzgerald",
    "PrimaryGenre": "Fiction",
    "MultipleAwards": False,
    "ReadingTimeHours": 9.0,
    "Published21stCentury": True
}
new_df = pd.DataFrame(new_book, index=[len(df)])
df = pd.concat([df, new_df])    

In [None]:
# save the dataframe to a new csv file
df.to_csv('activity_solutions_files/sol_07.csv', index=True)

Assertions:

In [None]:
read_csv_kwargs = {'index_col': 0}
assert_pd_dataframe_variable_equals_csv('df', 'sol_07.csv', read_csv_kwargs=read_csv_kwargs)

##### Activity 8. Save the updated dataframe in new CSV file

Save the updated dataframe `df` in a new CSV file named `updated_best_book.csv`. Save this file in current directory only. 

> Make sure not to reset the index

Solution:

In [14]:
# save the dataframe to a new csv file
df.to_csv('updated_best_book.csv', index=False)

In [17]:
# save the dataframe to a new csv file
df.to_csv('activity_solutions_files/sol_08.csv', index=False)

Assertions:

In [18]:
read_csv_kwargs = {'index_col': 0}
assert_pd_dataframe_csv_equals_csv('updated_best_book.csv', 'sol_08.csv', read_csv_kwargs=read_csv_kwargs)

##### Activity 9. Save the dataframe in CSV file.

Use the below dictionary and first convert this dictionary into dataframe and store it in a dataframe `student_df` and then save this dataframe in a CSV file called `student_data.csv`.

```python
student_dict = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eva'],
    'Age': [18, 17, 19, 18, 17],
    'Grade': ['A', 'B', 'A', 'B', 'A']
}
```

Solution:

In [20]:
import pandas as pd

# Step 1: Given dictionary
student_dict = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eva'],
    'Age': [18, 17, 19, 18, 17],
    'Grade': ['A', 'B', 'A', 'B', 'A']
}

# Step 2: Convert the dictionary into a DataFrame
student_df = pd.DataFrame(student_dict)

# Step 3: Save the DataFrame to a CSV file
student_df.to_csv('student_data.csv', index=False)


In [21]:
student_df.to_csv('activity_solutions_files/sol_09.csv', index=False)

Assertions:

In [23]:
student_dict = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eva'],
    'Age': [18, 17, 19, 18, 17],
    'Grade': ['A', 'B', 'A', 'B', 'A']
}

expected_student_df = pd.DataFrame(student_dict)
assert_pd_dataframe_variable_equals_variable('student_df', 'expected_student_df', delete_afterwards=True)

assert_pd_dataframe_csv_equals_csv('student_data.csv', 'sol_09.csv')

### The End!