### **Note**: The original dataset at location `/data/goodreads-best-book-ever-dataset/Best_Books_Ever.csv` contains null and invalid values. Since this project activity is part of the `DataFrame Mutations in Intro to Data Analysis Track`, and students at this level may not have knowledge of null and invalid values, the original dataset has been transformed to `Best_Books_Ever.csv`, which is used in this POC notebook.

In [2]:
exec(open("utils.py").read())

### Introduction

In this coding lab, you will learn about creating new columns, deleting rows and columns, modifying the structure of a dataframe, adding new rows, and using the `inplace` parameter. We will work with the "Best Books Ever" dataset from the GoodReads BBE Dataset repository, which contains detailed information on books listed as the best on Goodreads. This dataset includes fields such as book identifiers, titles, authors, publication years, ratings, and more, making it an invaluable resource for academic and research projects focused on analyzing literature trends, reader preferences, and book metadata for various applications in data science and literary studies.

By engaging with this project, you will gain practical experience in data manipulation and analysis using the pandas library. You will learn how to create new columns, delete rows and columns, modify the dataframe structure, add new rows, and use the `inplace` parameter.

> **Important Note**: Some of these activities will involve modifying the original dataframe through mutable operations. It is important to note that once data has been mutated, reverting to a previous state is not possible. If an error is made, you may need to start over from the beginning. Although mutable operations are generally not recommended due to their irreversible nature, this project aims to expose you to these "bad practices" that are still prevalent in the industry. By requiring the use of mutable operations, your notebook will need to be exceptionally clean and organized, as you may need to restart everything from scratch (from Cell 1, reading data, all the way down) if mistakes are made.

This project has been thoroughly tested and is functional. If your solution does not work, it is likely due to a mistake in one of your previous steps.

### Import the libraries and load the dataset

In [3]:
import pandas as pd

df = pd.read_csv('Best_Books_Ever.csv')

In [4]:
df.head()

Unnamed: 0,bookId,title,series,author,rating,description,language,isbn,genres,characters,...,firstPublishDate,awards,numRatings,ratingsByStars,likedPercent,setting,coverImg,bbeScore,bbeVotes,price
0,2.Harry_Potter_and_the_Order_of_the_Phoenix,Harry Potter and the Order of the Phoenix,Harry Potter #5,"J.K. Rowling, Mary GrandPré (Illustrator)",4.5,There is a door at the end of a silent corrido...,English,9780439358071,"['Fantasy', 'Young Adult', 'Fiction', 'Magic',...","['Sirius Black', 'Draco Malfoy', 'Ron Weasley'...",...,2003-06-21,['Bram Stoker Award for Works for Young Reader...,2507623,"['1593642', '637516', '222366', '39573', '14526']",98.0,['Hogwarts School of Witchcraft and Wizardry (...,https://i.gr-assets.com/images/S/compressed.ph...,2632233,26923,7.38
1,30.J_R_R_Tolkien_4_Book_Boxed_Set,J.R.R. Tolkien 4-Book Boxed Set: The Hobbit an...,The Lord of the Rings #0-3,J.R.R. Tolkien,4.6,"This four-volume, boxed set contains J.R.R. To...",English,9780345538376,"['Fantasy', 'Fiction', 'Classics', 'Adventure'...","['Frodo Baggins', 'Gandalf', 'Bilbo Baggins', ...",...,2055-10-20,[],110146,"['78217', '22857', '6628', '1477', '967']",98.0,['Middle-earth'],https://i.gr-assets.com/images/S/compressed.ph...,1159802,12111,21.15
2,375802.Ender_s_Game,Ender's Game,Ender's Saga #1,"Orson Scott Card, Stefan Rudnicki (Narrator), ...",4.3,"Andrew ""Ender"" Wiggin thinks he is playing com...",English,9780812550702,"['Science Fiction', 'Fiction', 'Young Adult', ...","['Dink', 'Bernard', 'Valentine Wiggin', 'Peter...",...,1985-10-28,"['Hugo Award for Best Novel (1986)', 'Nebula A...",1131303,"['603209', '339819', '132305', '35667', '20303']",95.0,[],https://i.gr-assets.com/images/S/compressed.ph...,720651,7515,4.6
3,17245.Dracula,Dracula,Dracula #1,"Bram Stoker, Nina Auerbach (Editor), David J. ...",4.0,You can find an alternative cover edition for ...,English,9780393970128,"['Classics', 'Horror', 'Fiction', 'Fantasy', '...","['Jonathan Harker', 'Lucy Westenra', 'Abraham ...",...,1997-05-26,[],938325,"['345260', '329217', '197206', '48642', '18000']",93.0,"['Transylvania (Romania)', 'Budapest (Hungary)...",https://i.gr-assets.com/images/S/compressed.ph...,646782,6988,4.55
4,28187.The_Lightning_Thief,The Lightning Thief,Percy Jackson and the Olympians #1,Rick Riordan (Goodreads Author),4.26,Alternate cover for this ISBN can be found her...,English,9780786838653,"['Fantasy', 'Young Adult', 'Mythology', 'Ficti...","['Annabeth Chase', 'Grover Underwood', 'Luke C...",...,2005-06-28,"[""Young Readers' Choice Award (2008)"", 'Books ...",1992300,"['1006885', '604999', '289310', '64014', '27092']",95.0,"['New York City, New York (United States)', 'M...",https://i.gr-assets.com/images/S/compressed.ph...,597132,6370,1.79


In [5]:
df.shape

(794, 25)

In [6]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 794 entries, 0 to 793
Data columns (total 25 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   bookId            794 non-null    object 
 1   title             794 non-null    object 
 2   series            794 non-null    object 
 3   author            794 non-null    object 
 4   rating            794 non-null    float64
 5   description       794 non-null    object 
 6   language          794 non-null    object 
 7   isbn              794 non-null    int64  
 8   genres            794 non-null    object 
 9   characters        794 non-null    object 
 10  bookFormat        794 non-null    object 
 11  edition           794 non-null    object 
 12  pages             794 non-null    int64  
 13  publisher         794 non-null    object 
 14  publishDate       794 non-null    object 
 15  firstPublishDate  794 non-null    object 
 16  awards            794 non-null    object 
 1

### Activities

##### Activity 1. Calculating the Price-to-Rating Ratio

Create a new column `Price-to-Rating Ratio` in the DataFrame that calculates the price-to-rating ratio for each book. This ratio will help us understand how the price of a book relates to its average rating.

**Hint**: You can create a new column by assigning the result of dividing the `price` column by the `rating` column.

Solution:

In [8]:
df['price_to_rating'] = df['price'] / df['rating']

This activity demonstrates how to create new columns in a DataFrame, a fundamental data manipulation task. It also introduces the concept of deriving new insights from existing data.

In [None]:
# save the dataframe to a new csv file
df.to_csv('activity_solutions_files/sol_01.csv', index=True)

Assertions:

In [9]:
assert_pd_dataframe_variable_column_equals_csv('df', 'price_to_rating', 'sol_01.csv')

##### Activity 2. Remove the "isbn" Column

The "isbn" column is not needed for our analysis. Write a script to remove this column from the dataframe.

**Hint**: You can remove a column from a DataFrame using the `drop` method.

Solution:

In [10]:
df.drop(columns='isbn', inplace=True)

This line of code removes the `isbn` column from the dataframe and applies the change directly to the dataframe by using `inplace=True`.

In [11]:
# save the dataframe to a new csv file
df.to_csv('activity_solutions_files/sol_02.csv', index=True)

In [12]:
# save the dataframe to a new csv file
df.to_csv('activity_solutions_files/sol_02.csv', index=True)

##### Activity 3. Remove the Rows with Missing Values

Write a script to extract the publication year from the `publishDate` column and create a new column named `YearPublished` in the dataframe.

**Hint**: Use the `.str.extract()` method with a regular expression to extract the year from the `publishDate` column.

Solution:

In [87]:
df['YearPublished'] = df['publishDate'].str.extract(r'(\d{4})')

In [None]:
# save the dataframe to a new csv file
df.to_csv('activity_solutions_files/sol_03.csv', index=True)

Assertions:

In [None]:
read_csv_kwargs = {'index_col': 0, 'dtype': {'YearPublished': 'int32'}}
assert_pd_dataframe_variable_column_equals_csv('df', 'YearPublished', 'sol_03.csv', read_csv_kwargs=read_csv_kwargs)

##### Activity 4. Filter Books with Ratings Above 4.5

Create a new dataframe that only include books with ratings above 4.5. Name this new dataframe `best_books`.

**Hint**: Use boolean mask to filter the rows with ratings above 4.5.

Solution:

In [88]:
best_books = df[df['rating'] >= 4.5]

This line of code creates a new dataframe `best_books` that only includes books with ratings above 4.5.

In [89]:
best_books.shape

(24, 26)

In [None]:
# save the dataframe to a new csv file
best_books.to_csv('activity_solutions_files/sol_04.csv', index=True)

Assertions:

In [None]:
read_csv_kwargs = {'index_col': 0, 'dtype': {'YearPublished': 'int32'}}
assert_pd_dataframe_variable_equals_csv('best_books', 'sol_04.csv', read_csv_kwargs=read_csv_kwargs)

##### Activity 5. Count and Add the Number of Genres

Each book is associated with multiple genres in the form of list of strings. Create a new column `GenreCount` that stores the number of genres associated with each book.

**Hint**: Use the `apply` method with a custom lambda function to count the number of genres for each book.

Solution:

In [90]:
df['GenreCount'] = df['genres'].apply(lambda x: len(eval(x)))

This solution first evaluates the string representation of the list in the `genres` column back into a list (assuming the `genres` column contains string representations of lists) and then counts the number of elements (genres) in each list, storing this count in a new column `GenreCount`.

In [None]:
# save the dataframe to a new csv file
df.to_csv('activity_solutions_files/sol_05.csv', index=True)

Assertions:

In [None]:
read_csv_kwargs = {'index_col': 0}
assert_pd_dataframe_variable_column_equals_csv('df', 'GenreCount', 'sol_05.csv', read_csv_kwargs=read_csv_kwargs)

##### Activity 6. Split Author Names into First and Last Name Columns

Some analyses might require having the author's first and last names in separate columns. Write a script to create two new columns, `FirstName` and `LastName`, from the `author` column. For simplicity, assume the last word in the `author` field is the last name and everything before it is the first name.

**Hint**:
- Use the `str.split` method to split the `author` column into two columns.
- You can also use `.str.rsplit()` to split the string from the right and use the `n=1` parameter to split the `author` column into two parts.

Solution:

In [91]:
df[['FirstName', 'LastName']] = df['author'].str.rsplit(' ', expand=True, n=1)

This script splits the `author` column into two at the last occurrence of a space, assuming the format "FirstName LastName". It then assigns the resulting two columns to the dataframe, creating new columns for the first and last names.

In [None]:
# save the dataframe to a new csv file
df.to_csv('activity_solutions_files/sol_06.csv', index=True)

Assertions:

In [None]:
read_csv_kwargs = {'index_col': 0, 'dtype': {'YearPublished': 'int32'}}
assert_pd_dataframe_variable_equals_csv('df', 'sol_06.csv', read_csv_kwargs=read_csv_kwargs)

##### Activity 7. Drop Books with Fewer than 100 Pages

Some entries in the dataset might represent short stories or other short works. For this activity, remove all rows from the dataframe where the number of pages is less than 100.

**Hint**: Use the `drop` method to remove rows from the dataframe.

Solution:

In [93]:
df = df[df['pages'] >= 100]

This line of code removes all rows from the dataframe where the number of pages is less than 100.

In [None]:
# save the dataframe to a new csv file
df.to_csv('activity_solutions_files/sol_07.csv', index=True)

Assertions:

In [None]:
read_csv_kwargs = {'index_col': 0, 'dtype': {'YearPublished': 'int32'}}
assert_pd_dataframe_variable_equals_csv('df', 'sol_07.csv', read_csv_kwargs=read_csv_kwargs)

##### Activity 8. Extract the Primary Genre

Each book can belong to multiple genres. Create a new column `PrimaryGenre` that contains only the first genre listed for each book. Genre column contains a list of genres.

**Hint**: As genres column contains a list of genres. Use the `.apply()` method with a lambda function to extract the first genre.

Solution:

In [94]:
df['PrimaryGenre'] = df['genres'].apply(lambda x: eval(x)[0] if len(eval(x)) > 0 else None)

In this code snippet, the first genre in the `genres` column is extracted and stored in a new column `PrimaryGenre` using the `apply` method with a custom lambda function. We used `eval` to convert the string representation of the list in the `genres` column back into a list.

In [None]:
# save the dataframe to a new csv file
df.to_csv('activity_solutions_files/sol_08.csv', index=True)

Assertions:

In [None]:
read_csv_kwargs = {'index_col': 0}
assert_pd_dataframe_variable_column_equals_csv('df', 'PrimaryGenre', 'sol_08.csv', read_csv_kwargs=read_csv_kwargs)

##### Activity 9. Flag Books with multiple Awards

Create a new column `MultipleAwards` that flags books that have won multiple awards. If a book has won more than one award, set the value of `MultipleAwards` to `True`; otherwise, set it to `False`.

**Hint**: Use the `apply` method with a custom lambda function to check if the length of the `awards` list is greater than 1.

Solution:

In [95]:
df['MultipleAwards'] = df['awards'].apply(lambda x: len(eval(x)) > 1)

This line of code creates a new column `MultipleAwards` that flags books that have won multiple awards. If a book has won more than one award, the value of `MultipleAwards` is set to `True`; otherwise, it is set to `False`.

In [None]:
# save the dataframe to a new csv file
df.to_csv('activity_solutions_files/sol_09.csv', index=True)

Assertions:

In [None]:
read_csv_kwargs = {'index_col': 0}
assert_pd_dataframe_variable_column_equals_csv('df', 'MultipleAwards', 'sol_09.csv', read_csv_kwargs=read_csv_kwargs)

##### Activity 10. Estimate Reading Time Based on Page Count

Assuming an average reading speed of 250 words per minute and approximately 300 words per page, create a new column `ReadingTimeHours` that estimates the reading time in hours for each book.

**Hint**: Calculate the reading time by multiplying the number of pages by the words per page, then divide by the words per minute, and finally divide by 60 to convert the reading time to hours.

Solution:

In [97]:
df['ReadingTimeHours'] = (df['pages'] * 300) / (250 * 60)

Above code snippet calculates the reading time in hours for each book and stores the result in a new column `ReadingTimeHours`. The reading time is estimated based on the number of pages and an average reading speed of 250 words per minute. We assume approximately 300 words per page.

In [None]:
# save the dataframe to a new csv file
df.to_csv('activity_solutions_files/sol_10.csv', index=True)

Assertions:

In [None]:
read_csv_kwargs = {'index_col': 0}
assert_pd_dataframe_variable_column_equals_csv('df', 'ReadingTimeHours', 'sol_10.csv', read_csv_kwargs=read_csv_kwargs)

##### Activity 11. Flag 21st Century Publications

Create a new column `Published21stCentury` that flags (`True/False`) whether a book was published in the 21st century (year 2000 and onwards).

**Hint**: Use a boolean condition based on the `YearPublished` column to check if the year is greater than or equal to 2000.

Solution:

In [98]:
df['Published21stCentury'] = df['YearPublished'].astype(float) >= 2000

In [None]:
# save the dataframe to a new csv file
df.to_csv('activity_solutions_files/sol_11.csv', index=True)

Assertions:

In [None]:
read_csv_kwargs = {'index_col': 0}
assert_pd_dataframe_variable_column_equals_csv('df', 'Published21stCentury', 'sol_11.csv', read_csv_kwargs=read_csv_kwargs)

##### Activity 12. Simplifying the DataFrame by Dropping Columns

Drop the `coverImg`, `description`, and `ratingsByStars` columns from the dataframe as they will not be used in further analysis. Drop these columns permanently by setting the `inplace` parameter to `True`.

**Hint**: Use the `drop` method to remove multiple columns from the dataframe.

Solution:

In [100]:
df.drop(columns=['coverImg', 'description', 'ratingsByStars'], inplace=True, axis=1)     

In [None]:
# save the dataframe to a new csv file
df.to_csv('activity_solutions_files/sol_12.csv', index=True)

Assertions:

In [None]:
read_csv_kwargs = {'index_col': 0, 'dtype': {'YearPublished': 'int32'}}
assert_pd_dataframe_variable_equals_csv('df', 'sol_12.csv', read_csv_kwargs=read_csv_kwargs)

##### Activity 13. Adding a New Book Entry

Add a new book entry to the dataframe with the following details:

```python
new_boos = {
    "bookID": 10000,
    "title": "The Great Gatsby",
    "author": "F. Scott Fitzgerald",
    "rating": 3.9,
    "pages": 180,
    "publishDate": "2003-09-30",
    "publisher": "Scribner",
    "price": 7.99,
    "genres": "['Fiction', 'Classics']",
    "YearPublished": 2003,
    "GenreCount": 2,
    "FirstName": "F.",
    "LastName": "Fitzgerald",
    "PrimaryGenre": "Fiction",
    "MultipleAwards": False,
    "ReadingTimeHours": 9.0,
    "Published21stCentury": True
}
```

> Add this new entry to the index `len(df)`.

**Hint**: Use the `.concat()` method to add a new row to the dataframe.

Solution:

In [104]:
new_book = {
    "bookID": 10000,
    "title": "The Great Gatsby",
    "author": "F. Scott Fitzgerald",
    "rating": 3.9,
    "pages": 180,
    "publishDate": "2003-09-30",
    "publisher": "Scribner",
    "price": 7.99,
    "genres": "['Fiction', 'Classics']",
    "YearPublished": 2003,
    "GenreCount": 2,
    "FirstName": "F.",
    "LastName": "Fitzgerald",
    "PrimaryGenre": "Fiction",
    "MultipleAwards": False,
    "ReadingTimeHours": 9.0,
    "Published21stCentury": True
}
new_df = pd.DataFrame(new_book, index=[len(df)])
df = pd.concat([df, new_df])    

In [None]:
# save the dataframe to a new csv file
df.to_csv('activity_solutions_files/sol_13.csv', index=True)

Assertions:

In [None]:
read_csv_kwargs = {'index_col': 0}
assert_pd_dataframe_variable_equals_csv('df', 'sol_13.csv', read_csv_kwargs=read_csv_kwargs)

##### Activity 14. Transforming Publish Dates into Datetime Format

The `publishDate` and `firstPublishDate` columns contain dates in object(string) format. Convert these columns into datetime objects to enable more sophisticated date-based operations and analyses.

> Use the format `"%Y-%m-%d"` to convert the string dates into datetime objects and also pass the `errors='coerce'` parameter to handle any errors that may occur during the conversion.

**Hint**: Use the `pd.to_datetime()` function to convert the string dates into datetime objects. You can apply this function directly to the columns.

Solution:

In [108]:
# Convert 'publishDate' and 'firstPublishDate' columns to datetime
df['publishDate'] = pd.to_datetime(df['publishDate'], errors='coerce', format='%Y-%m-%d')
df['firstPublishDate'] = pd.to_datetime(df['firstPublishDate'], errors='coerce', format='%Y-%m-%d')

In [None]:
# save the dataframe to a new csv file
df.to_csv('activity_solutions_files/sol_14.csv', index=True)

Assertions:

In [None]:
read_csv_kwargs = {'index_col': 0, 'parse_dates': ['publishDate', 'firstPublishDate']}
assert_pd_dataframe_variable_equals_csv('df', 'sol_14.csv', read_csv_kwargs=read_csv_kwargs)

##### Activity 15. Bulk Adding New Book Entries to the DataFrame

Add multiple new book entries to the DataFrame at once. This activity involves creating a list of dictionaries, where each dictionary represents a new book entry with values for all the relevant columns, and then appending this list to the existing DataFrame.

Below are the details of the new book entries:

```python
new_books = [
    {
        "bookID": 10001,
        "title": "To Kill a Mockingbird",
        "author": "Harper Lee",
        "rating": 4.3,
        "pages": 281,
        "publishDate": "1960-07-11",
        "publisher": "J.B. Lippincott & Co.",
        "price": 9.99,
        "genres": "['Fiction', 'Classics']",
        "YearPublished": 1960,
        "GenreCount": 2,
        "FirstName": "Harper",
        "LastName": "Lee",
        "PrimaryGenre": "Fiction",
        "MultipleAwards": False,
        "ReadingTimeHours": 11.24,
        "Published21stCentury": False
    },
    {
        "bookID": 10002,
        "title": "1984",
        "author": "George Orwell",
        "rating": 4.2,
        "pages": 328,
        "publishDate": "1949-06-08",
        "publisher": "Secker & Warburg",
        "price": 12.99,
        "genres": "['Fiction', 'Classics']",
        "YearPublished": 1949,
        "GenreCount": 2,
        "FirstName": "George",
        "LastName": "Orwell",
        "PrimaryGenre": "Fiction",
        "MultipleAwards": False,
        "ReadingTimeHours": 13.12,
        "Published21stCentury": False
    }
]
```

> Add these new entries to the dataframe at position `len(df)` and `len(df) + 1` respectively.

Solution:

In [109]:
new_books = [
    {
        "bookID": 10001,
        "title": "To Kill a Mockingbird",
        "author": "Harper Lee",
        "rating": 4.3,
        "pages": 281,
        "publishDate": "1960-07-11",
        "publisher": "J.B. Lippincott & Co.",
        "price": 9.99,
        "genres": "['Fiction', 'Classics']",
        "YearPublished": 1960,
        "GenreCount": 2,
        "FirstName": "Harper",
        "LastName": "Lee",
        "PrimaryGenre": "Fiction",
        "MultipleAwards": False,
        "ReadingTimeHours": 11.24,
        "Published21stCentury": False
    },
    {
        "bookID": 10002,
        "title": "1984",
        "author": "George Orwell",
        "rating": 4.2,
        "pages": 328,
        "publishDate": "1949-06-08",
        "publisher": "Secker & Warburg",
        "price": 12.99,
        "genres": "['Fiction', 'Classics']",
        "YearPublished": 1949,
        "GenreCount": 2,
        "FirstName": "George",
        "LastName": "Orwell",
        "PrimaryGenre": "Fiction",
        "MultipleAwards": False,
        "ReadingTimeHours": 13.12,
        "Published21stCentury": False
    }
]

new_books_df = pd.DataFrame(new_books)
df = pd.concat([df, new_books_df], ignore_index=True)

In [110]:
df.tail()

Unnamed: 0,bookId,title,series,author,rating,language,genres,characters,bookFormat,edition,...,price_to_rating,YearPublished,GenreCount,FirstName,LastName,PrimaryGenre,MultipleAwards,ReadingTimeHours,Published21stCentury,bookID
758,808853.The_Complete_Adventures_of_Curious_George,The Complete Adventures of Curious George,Curious George Original Adventures,"Margret Rey, H.A. Rey",4.23,English,"['Childrens', 'Picture Books', 'Fiction', 'Cla...",[],Hardcover,70th Anniversary Edition,...,1.404255,2001,10,"Margret Rey, H.A.",Rey,Childrens,False,8.64,True,
759,7199219-ed-n,Edén,Edén #1,Kioskerman,3.97,Spanish,"['Comics', 'Graphic Novels', 'Graphic Novels C...",[],Paperback,Recopilatorio.,...,3.778338,2009,5,Kioskerman,,Comics,False,2.4,True,
760,,The Great Gatsby,,F. Scott Fitzgerald,3.9,,"['Fiction', 'Classics']",,,,...,,2003,2,F.,Fitzgerald,Fiction,False,9.0,True,10000.0
761,,To Kill a Mockingbird,,Harper Lee,4.3,,"['Fiction', 'Classics']",,,,...,,1960,2,Harper,Lee,Fiction,False,11.24,False,10001.0
762,,1984,,George Orwell,4.2,,"['Fiction', 'Classics']",,,,...,,1949,2,George,Orwell,Fiction,False,13.12,False,10002.0


In [None]:
# save the dataframe to a new csv file
df.to_csv('activity_solutions_files/sol_15.csv', index=True)

Assertions:

In [None]:
read_csv_kwargs = {'index_col': 0, 'parse_dates': ['publishDate', 'firstPublishDate']}
assert_pd_dataframe_variable_equals_csv('df', 'sol_15.csv', read_csv_kwargs=read_csv_kwargs)

##### Activity 16. Save the updated dataframe in new CSV file

Save the updated dataframe `df` in a new CSV file named `updated_best_book.csv`. Save this file in current directory only. 

> Make sure not to reset the index

Solution:

In [14]:
# save the dataframe to a new csv file
df.to_csv('updated_best_book.csv', index=False)

In [17]:
# save the dataframe to a new csv file
df.to_csv('activity_solutions_files/sol_16.csv', index=False)

Assertions:

In [18]:
read_csv_kwargs = {'index_col': 0}
assert_pd_dataframe_csv_equals_csv('updated_best_book.csv', 'sol_16.csv', read_csv_kwargs=read_csv_kwargs)

##### Activity 17. Save the dataframe in CSV file.

Use the below dictionary and first convert this dictionary into dataframe and store it in a dataframe `student_df` and then save this dataframe in a CSV file called `student_data.csv`.

```python
student_dict = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eva'],
    'Age': [18, 17, 19, 18, 17],
    'Grade': ['A', 'B', 'A', 'B', 'A']
}
```

Solution:

In [20]:
import pandas as pd

# Step 1: Given dictionary
student_dict = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eva'],
    'Age': [18, 17, 19, 18, 17],
    'Grade': ['A', 'B', 'A', 'B', 'A']
}

# Step 2: Convert the dictionary into a DataFrame
student_df = pd.DataFrame(student_dict)

# Step 3: Save the DataFrame to a CSV file
student_df.to_csv('student_data.csv', index=False)


In [21]:
student_df.to_csv('activity_solutions_files/sol_17.csv', index=False)

Assertions:

In [23]:
student_dict = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eva'],
    'Age': [18, 17, 19, 18, 17],
    'Grade': ['A', 'B', 'A', 'B', 'A']
}

expected_student_df = pd.DataFrame(student_dict)
assert_pd_dataframe_variable_equals_variable('student_df', 'expected_student_df', delete_afterwards=True)

assert_pd_dataframe_csv_equals_csv('student_data.csv', 'sol_17.csv')