# **Creating DataFrames**

We will create a DataFrame object from a dataset in a CSV file using the read_csv method on the pandas - pd - object. While the read_csv method can read dataset directly from a web URL, it's best to download to your computer first to ensure the data is persisted.

# **About the Data**

We will use the dataset on 'Lionel Messi | All Club Goals' available at: https://www.kaggle.com/datasets/azminetoushikwasi/-lionel-messi-all-club-goals

# **Imports**

In [2]:
import datetime
import numpy as np
import pandas as pd

### Finding information on the file before reading it in
Before attempting to read in a file, we can use the command line to see important information about the file that may determine how we read it in. We can run command line code from Jupyter Notebooks (thanks to IPython) by using ! before the code.

### Number of lines (row count)

For example, we can find out how many lines are in the file by using the wc utility (word count) and counting lines in the file (-l). Run the cell below to confirm the file has 704 lines:

In [None]:
!wc -l data.csv

### Reading in the file
Our file is small in size, has headers in the first row, and is comma-separated, so we don't need to provide any additional arguments to read in the file with `pd.read_csv()`.
To read data from file we can use `pd.read_csv()` and for other delimited files, such as tab (\t), we can use the `read_csv()` function with the sep argument equal to the delimiter. We can use the `read_excel()` function for Excel files, the `read_json()` function for JSON (JavaScript Object Notation) files.

In [4]:
import pandas as pd
df = pd.read_csv('data.csv')

Let's review summary statistics for the 'Lionel Messi | All Club Goals' dataframe - df:  use `describe()`.

In [None]:
df.describe()

`df.describe()` does not really tell us much.  The `info()` method provides more information as you can check by running the code cell below.

In [None]:
df.info()

We can use the dataframe `head()` or `tail()` method to view some actual entries. Without a numeric parameter, both methods return 5 entries!

**Run the next 2 code cells below to see the first 10 and the last 10 entries**

In [None]:
df.head(10)

In [None]:
df.tail(10)

# **Querying & Locating Data in the DataFrame**
1. One of the most useful tasks in pandas is locating data that satisfies desired criteria. For example, we can locate in which seasons and year did Messi score the most goals for his clubs. Witness the evolution of Messi's goal-scoring prowess from his early years to the peak of his career. Identify the seasons where he reached new heights.

In [None]:
goals_per_season = df['Season'].value_counts()

most_goals_season = goals_per_season.idxmax()

most_goals_count = goals_per_season.max()

print(f"Season with the most goals: {most_goals_season}")
print(f"Number of goals in the season: {most_goals_count}")

df['Year'] = pd.to_datetime(df['Date'], errors='coerce').dt.year

goals_per_year = df['Year'].value_counts()

most_goals_year = goals_per_year.idxmax()

most_goals_count = goals_per_year.max()

print(f"Year with the most goals: {most_goals_year}")
print(f"Number of goals in the year: {most_goals_count}")

**Lionel Messi holds the record of most goals recorded in a single season and in a year. CRAZYYY!!!**

2. We can now examine how Messi's goals are distributed across the clubs he played for.

In [None]:
goals_per_club = df.groupby('Club').size()

print("Total number of goals scored per club:")
print(goals_per_club)

**He has also scored the most goals for FC Barcelona and La Liga.**

3. We should have done this as the first query but we are doing it now, let's find out the total number of goals scored by Lionel Messi at club level.

In [None]:
total_goals = len(df)

print("Total number of goals scored by Lionel Messi at club level:")
print(total_goals)

4. Now, let's calculate top 10 opponents Lionel Messi has scored the most goals against.

In [None]:
goals_per_opponent = df['Opponent'].value_counts()

top_10_opponents = goals_per_opponent.head(10)

print("Top 10 opponents Messi has scored the most goals against:")
print(top_10_opponents)

5. Are you eager to know how many left-footed goals, right-footed goals, headers, direct free kicks, penalties, etc, Messi has scored? Let's find out.

In [None]:
goals_per_type = df['Type'].value_counts()

print("Goals scored by different types:")
print(goals_per_type)

**His finishing is mesmerizing, especially FreeKicks and SoloRuns, you would come off your seats when you watch them.**

6. We should highlight Messi's contributions in crucial competitions like in UEFA Champions League, domestic cups, and other pivotal matches.

In [None]:
specific_competitions = ['UEFA Champions League', 'Copa del Rey', 'Supercopa', 'FIFA Club World Cup', 'UEFA Super Cup', ]

specific_competitions_goals = df[df['Competition'].isin(specific_competitions)]

total_goals_by_competition = specific_competitions_goals.groupby('Competition').size().sort_values(ascending=False)

print("Total goals scored by Messi in key competitions:")
print(total_goals_by_competition)

7. I am very exited to find out how many goals have Leo scored in club finals.

In [None]:
df['Matchday'] = df['Matchday'].str.lower()

final_match_day = 'final'

final_match_day_goals = df[df['Matchday'] == final_match_day]

total_goals_by_competition = final_match_day_goals.groupby('Competition').size().sort_values(ascending=False)

print(f"Goals scored by Messi in club finals:")
print(total_goals_by_competition)

print("\nTotal goals scored by Leo in club finals:")
print(total_goals_by_competition.sum())

**Leo Messi also holds the record of most finals played and most number of goals scored in a final. He cannot stop winning and scoring.**

8. Let's see who were the main assist giver to Messi.

In [None]:
top_10_assist_providers = df['Goal_assist'].value_counts().head(10)

print("Top 10 assist providers to Messi:")
print(top_10_assist_providers)

**All the names above helped Leo to become what he is today.**

# **THANK YOU**