# Instructions for Pandas Practice
---
Welcome to the Pandas Practice! In this exercise, you will work with the Rick and Morty dataset to practice your data manipulation and analysis skills using Pandas. Please follow the instructions below carefully:

1. **Understand the Dataset**: The dataset used in this practice is related to the Rick and Morty TV show. Familiarize yourself with the dataset's structure and content before starting.

2. **Read Each Question Thoroughly**: Each question in this practice is designed to test different Pandas functionalities. Make sure you understand the requirements of each question before you begin coding.

3. **Write Your Code in the Provided Cells**: For each question, a code cell is provided where you should write your solution. Do not modify any other cells or sections of the notebook.

4. **Execute Your Code**: After writing your code in each cell, run the cell to check if your code produces the expected results. Verify that the output matches the requirements specified in the question.

5. **Complete All Questions**: Ensure that you attempt and complete all the questions provided in the notebook. Each question is designed to test different Pandas skills and concepts.

6. **Review Your Work**: Before submitting, double-check your answers and make sure all questions are addressed. Ensure that the notebook runs without errors and that all outputs are correct.

7. **Download Your Notebook**: Once you have completed all the questions and verified your solutions, download the notebook file (.ipynb). You can do this by selecting `File` > `Download` > `Download .ipynb` from the Google Colab menu.

8. **Submit Your Practice**: Upload the downloaded .ipynb file to the designated learning platform for submission.

9. **Verify Submission**: Ensure that you have uploaded the correct file and that it is not corrupted. If you encounter any issues with the file, you may need to resubmit.

---

Good luck with your Practice!

In [17]:
import numpy as np
import pandas as pd

locations_data_url = "https://s3.ap-south-1.amazonaws.com/new-assets.ccbp.in/frontend/content/aiml/classical-ml/locations_dataset.csv"
episodes_data_url = "https://s3.ap-south-1.amazonaws.com/new-assets.ccbp.in/frontend/content/aiml/classical-ml/episodes_dataset.csv"
character_data_url = "https://s3.ap-south-1.amazonaws.com/new-assets.ccbp.in/frontend/content/aiml/classical-ml/characters_dataset.csv"

characters = pd.read_csv(character_data_url,index_col=0)
episodes = pd.read_csv(episodes_data_url,index_col=0)
locations = pd.read_csv(locations_data_url,index_col=0)

Q1. List the `name`s of all episodes that were released (based on air_date) in the year 2014 and print them as a Python list.


In [18]:
episodes['air_date'] = pd.to_datetime(episodes['air_date'])
episodes_2014 = episodes[episodes['air_date'].dt.year == 2014]['name'].tolist()
print(episodes_2014)

['M. Night Shaym-Aliens!', 'Meeseeks and Destroy', 'Rick Potion #9', 'Raising Gazorpazorp', 'Rixty Minutes', 'Something Ricked This Way Comes', 'Close Rick-counters of the Rick Kind', 'Ricksy Business']


Q2. From the `episodes`, extract the year from `air_date`, determine which year has the most episodes, and print that year.

In [19]:
episodes['year'] = pd.to_datetime(episodes['air_date']).dt.year
most_common_year = episodes['year'].value_counts().idxmax()
print(most_common_year)

2015


Q 3. From the `episodes`, find the **earliest air date** among all episodes and print it.

In [20]:
earliest_episode = pd.to_datetime(episodes['air_date']).min()

print(earliest_episode)

2013-12-02 00:00:00


Q4. From the `locations` data, find all dimensions that appear 10 or more times and print their names as a Python list.

In [21]:
large_dimensions = locations['dimension'].value_counts()[locations['dimension'].value_counts() >= 10].index.tolist()
print(large_dimensions)

['unknown']


Q5. Count how many characters have the status **Alive** and print the total number.

In [22]:
alive_characters = characters[characters['status'] == 'Alive'].shape[0]

print(alive_characters)

8


Q.6 Calculate the **average number of characters** that appeared per episode (using the 'characters' column) and print the result.

In [23]:
avg_characters_per_episode = episodes['characters'].apply(lambda x: len(eval(x))).mean()
print(avg_characters_per_episode)

25.5


Q.7 Calculate the **percentage of characters whose status is `Alive`** and print the result formatted to **two decimal places followed by %**.

In [24]:
alive_percentage = (characters[characters['status'] == 'Alive'].shape[0] / characters.shape[0]) * 100
print("{:.2f}%".format(alive_percentage))

40.00%


Q. 8 Count how many episodes have an `episode` code earlier than **S03E1** and an `air_date` before 2014, then print the count.

In [25]:
# Filter episodes less than "S02E10" and year before 2014
# Ensure 'air_date' is in datetime format
episodes['air_date'] = pd.to_datetime(episodes['air_date'])
count_ids = len(episodes[(episodes['episode'] < "S03E1") & (episodes['air_date'].dt.year < 2014)])

print(count_ids)

3


Q. 9 Filter all episodes whose `'air_date'` year is **between 2011 and 2014 (inclusive)**, count the **number of unique episode IDs**, and print the result.


In [26]:
# Ensure 'air_date' is in datetime format
episodes['air_date'] = pd.to_datetime(episodes['air_date'])

# Filter episodes between 2011 and 2014
filtered_episodes = episodes[(episodes['air_date'].dt.year >= 2011) & (episodes['air_date'].dt.year <= 2014)]

# Count the IDs
count_ids = filtered_episodes['id'].nunique()

print(count_ids)


11


Q. 10 From the `episodes`, find and print the **latest (most recent) air date** among all episodes.


In [27]:
latest_date = pd.to_datetime(episodes['air_date']).max()

print(latest_date)

2015-09-27 00:00:00


Q. 11 Calculate the **percentage of characters whose status is `Dead`** and print the result formatted to **two decimal places followed by %**.


In [28]:
dead_percentage = (characters[characters['status'] == 'Dead'].shape[0] / characters.shape[0]) * 100
print("{:.2f}%". format(dead_percentage))

30.00%


Q. 12 Select and print all location names that start with the letter "S".

In [29]:
sname_count = locations[locations['name'].str.startswith('S')]
print(sname_count['name'])

15    St. Gloopy Noops Hospital
Name: name, dtype: object


Q. 13 Group `locations` by `dimension` and print the count of locations in each dimension.

In [30]:
locations_by_dimension = locations.groupby('dimension').size()
print(locations_by_dimension)

dimension
Cronenberg Dimension           1
Dimension 5-126                1
Dimension C-137                3
Fantasy Dimension              1
Post-Apocalyptic Dimension     1
Replacement Dimension          3
unknown                       10
dtype: int64


Q. 14 Group characters by `gender` and calculate the **average number of episodes per character**. Print the result.

In [31]:
mean_episodes_by_gender = characters.groupby('gender')['episode'].apply(lambda x: x.str.len().mean())
print(mean_episodes_by_gender)

gender
Female     986.750000
Male       473.733333
unknown     46.000000
Name: episode, dtype: float64


Q. 15 List all `locations` whose `name` contains "Earth" (case-insensitive) and print them as a Python list.

In [32]:
earth_locations = locations[locations['name'].str.contains('Earth', na=False)]['name'].tolist()
print(earth_locations)

['Earth (C-137)', 'Post-Apocalyptic Earth', 'Cronenberg Earth', 'Earth (5-126)', 'Earth (Replacement Dimension)']


# End!