# Lab Instructions

Find a dataset that interests you. I'd recommend starting on [Kaggle](https://www.kaggle.com/). Read through all of the material about the dataset and download a .CSV file.

1. Write a short summary of the data.  Where did it come from?  How was it collected?  What are the features in the data?  Why is this dataset interesting to you?  

2. Identify 5 interesting questions about your data that you can answer using Pandas methods.  

3. Answer those questions!  You may use any method you want (including LLMs) to help you write your code; however, you should use Pandas to find the answers.  LLMs will not always write code in this way without specific instruction.  

4. Write the answer to your question in a text box underneath the code you used to calculate the answer.



# Dataset summary and plan

I loaded the dataset file `vanguard_cards.csv` which is in this repository. The CSV contains trading-card attributes and the columns include: `card_name`, `clan`, `grade`, `power`, `shield`, `rarity`, `nation`, and `skill_type`.

Source: file included in the course Lab folder (`vanguard_cards.csv`) — the original source/collection method is not specified in the file metadata.

Why this dataset is interesting: it lets us compare card performance (power and shield) across different `clan`s and `grade`s, look at rarity distributions, and ask practical questions such as what clan has the highest average power or which card gives the best "power per grade" value.

Five questions I will answer in this lab:

1. Which clan has the highest average power? (use groupby + mean)
2. How many cards are in each grade? (use value_counts)
3. What is the distribution of card rarities? (use value_counts)
4. On average, do grade 1 or grade 2 cards provide higher shields? (compare means)
5. Which card has the highest power-to-grade ratio (power divided by grade)?

I will use Pandas code cells to compute each answer and write a short text explanation directly below the code cell with the results.


In [4]:
import pandas as pd

df = pd.read_csv("vanguard_cards.csv", engine="python")
df



Unnamed: 0,card_name,clan,grade,power,shield,rarity,nation,skill_type
0,Blaster Blade,Royal Paladin,2,10000,5000,RRR,Keter Sanctuary,ACT
1,Dragonic Overlord,Kagero,3,13000,0,RRR,Dragon Empire,ACT
2,Silent Tom,Oracle Think Tank,2,9000,5000,R,United Sanctuary,CONT
3,Demon Eater,Nubatama,3,12000,0,RR,Dragon Empire,AUTO
4,Nightmare Doll Alice,Pale Moon,2,8000,5000,RR,Dark States,AUTO
5,Holy Flame Dragon,Keter Sanctuary,3,13000,0,RRR,Keter Sanctuary,ACT
6,Phantom Blaster Dragon,Shadow Paladin,3,13000,0,SP,Keter Sanctuary,ACT
7,Steam Maiden Ul,&Gear Chronicle,1,7000,10000,C,Dark States,AUTO
8,Archbird of Vitality,Stoicheia,1,6000,10000,C,Stoicheia,CONT
9,Mighty Bolt Dragoon,Narukami,2,10000,5000,R,Dragon Empire,AUTO


### Quick look at the dataset

I loaded the CSV into `df` and displayed the DataFrame. The table above shows sample rows and columns; use `df.shape` or `df.head()` to inspect further if you want to explore more interactively.

This notebook will explicitly answer the five questions listed above using Pandas operations and short textual explanations below each result.


In [5]:
avg_power = df.groupby("clan")["power"].mean().sort_values(ascending=False)
avg_power


clan
Kagero               13000.0
Keter Sanctuary      13000.0
Shadow Paladin       13000.0
Nubatama             12000.0
Royal Paladin        10000.0
Narukami             10000.0
Oracle Think Tank     9000.0
Pale Moon             8000.0
&Gear Chronicle       7000.0
Stoicheia             6000.0
Name: power, dtype: float64

**Question 1 — Which clan has the highest average power?**

The code cell above computes average `power` for each `clan` and sorts them highest first. The top entry shows which clan has the highest mean power per card in this dataset. This is the value you should reference in your written answer (the cell output provides the exact numbers).

In [6]:
grade_counts = df["grade"].value_counts().sort_index()
grade_counts


grade
1    2
2    4
3    4
Name: count, dtype: int64

**Question 2 — How many cards are in each grade?**

The `value_counts()` call above counts how many cards belong to each `grade`. Use the printed result to list the counts for grade 1, grade 2, grade 3, etc. — this answers the second lab question.

In [7]:
rarity_counts = df["rarity"].value_counts()
rarity_counts


rarity
RRR    3
R      2
RR     2
C      2
SP     1
Name: count, dtype: int64

**Question 3 — What is the distribution of card rarities?**

The `rarity_counts` output above shows how many cards appear at each `rarity` level (for example C, R, RR, RRR, SP). This helps us understand how rare different cards are in the dataset.

In [8]:
grade1_shield = df[df["grade"] == 1]["shield"].mean()
grade2_shield = df[df["grade"] == 2]["shield"].mean()

grade1_shield, grade2_shield


(np.float64(10000.0), np.float64(5000.0))

**Question 4 — Do grade 1 or grade 2 cards provide higher shields on average?**

The cell above computes the mean `shield` values for grade 1 and grade 2 cards; compare the two numbers printed above to answer which grade provides higher shields on average.

In [9]:
df["ratio"] = df["power"] / df["grade"]
highest_ratio = df.loc[df["ratio"].idxmax(), ["card_name", "ratio"]]
highest_ratio


card_name    Steam Maiden Ul
ratio                 7000.0
Name: 7, dtype: object

**Question 5 — Which card has the highest power-to-grade ratio?**

This ratio (power divided by grade) gives a simple measure of how much power a card provides relative to its grade. The top value printed above shows the `card_name` with the highest ratio and the numeric ratio itself.

## Submission instructions

I've answered the five lab questions above using Pandas and included a short explanation under each code cell. To submit for the course:

1. Save and commit `Lab/1.5 Intro to Python and Pandas Lab.ipynb` to your GitHub repository (the file path in this repo).
2. Provide the instructor the direct GitHub file link to this notebook (for example: https://github.com/<your-username>/Data_Visualization_And_Modeling_Online-main/blob/main/Lab/1.5%20Intro%20to%20Python%20and%20Pandas%20Lab.ipynb).

Before submitting, open the notebook in Jupyter or VS Code and run all cells so outputs appear (this helps reviewers see your results and makes grading straightforward).