
# 🏁 Exploratory Data Analysis (EDA) — Daniel Ricciardo

In this notebook, we'll explore data about **Daniel Ricciardo's Formula 1 career** to practice **basic EDA** (Exploratory Data Analysis).  
We'll learn how to:
- Load and preview a dataset  
- Check for missing data  
- Summarize and understand columns  
- Create simple charts to spot patterns

> This is a gentle walkthrough — perfect for building EDA confidence.


## 1️⃣ Import Libraries and Load Data

In [None]:

import pandas as pd
import matplotlib.pyplot as plt

# Load dataset
df = pd.read_csv("daniel_ricciardo.csv")
df.head()



### 🧐 Step 1: Let's understand what we just loaded!
We'll check the **shape**, **data types**, and if there are **missing values**.


In [None]:

print("Shape of data:", df.shape)
print("\n--- Data Info ---")
print(df.info())
print("\n--- Missing Values ---")
print(df.isna().sum())



## 2️⃣ Quick Overview of the Data

Let's describe both **numerical** and **categorical** columns to get a sense of the data.


In [None]:

print("Numerical Summary:")
display(df.describe())
print("\nCategorical Summary:")
display(df.describe(include='object'))



## 3️⃣ Explore Categorical Columns

Let's see which **teams**, **cars**, and **engine types** appear most often.


In [None]:

print("Top 5 Teams:")
print(df['team'].value_counts().head())
print("\nTop 5 Cars:")
print(df['car'].value_counts().head())
print("\nTop 5 Engine Types:")
print(df['engine_type'].value_counts().head())



## 4️⃣ Visualizing Data

Let's create some simple visualizations to better understand the dataset.


In [None]:

# Entries by year
df['year'].value_counts().sort_index().plot(kind='bar')
plt.title('Entries by Year')
plt.xlabel('Year')
plt.ylabel('Count')
plt.show()

# Top teams
df['team'].value_counts().head(10).plot(kind='barh')
plt.title('Top 10 Teams by Entries')
plt.xlabel('Count')
plt.ylabel('Team')
plt.show()



## 5️⃣ Check for Patterns or Interesting Facts
Let's look for simple insights:
- Which year has the most records?
- Which team did he appear with most often?
- Are there many missing event details?


In [None]:

print("Year with most entries:", df['year'].mode()[0])
print("Most frequent team:", df['team'].mode()[0])
print("Missing 'event' values:", df['event'].isna().sum())



## 6️⃣ Wrap Up

We learned to:
- Inspect datasets with `.info()`, `.describe()`, and `.isna()`  
- Explore text columns with `.value_counts()`  
- Create basic bar plots to visualize distributions  

🎯 **Key takeaway:** EDA helps us **understand the story behind the data** before modeling or advanced analysis.
