# World Cup Matches -  Notebook
This notebook will guide you through:
- Loading and inspecting the dataset with Pandas
- Selecting specific rows and columns
- Filtering data with boolean masks
- Adding new columns and modifying existing ones

## Objectives
1. Access information about a dataset with pandas methods.
2. Select rows and columns using `.loc` and `.iloc`.
3. Use boolean indexing to filter data.
4. Add and modify columns in a DataFrame.

In [None]:
import pandas as pd

# Load the dataset
df = pd.read_csv('WorldCupMatches.csv')

# Display the first few rows
df.head()

## Basic Data Inspection
We'll look at the last rows, data info, shape, and column names.

In [None]:
# Display last rows
df.tail()

In [None]:
# Get a concise summary of the data
df.info()

#df.describe()

In [None]:
# Get the shape (rows, columns)
df.shape


In [None]:
# Get column names
df.columns

## Selecting Rows and Columns
Use `.iloc` and `.loc` for slicing and indexing.

In [None]:
# Select rows by index from 3 to 5
df.iloc[3:6]


In [None]:
# Select rows by index 5 to 9, only 'Home Team Name' and 'Away Team Name'
# df.loc[row_selector, column_selector]
df.loc[5:9, ['Home Team Name', 'Away Team Name']]

# loc vs. iloc in Pandas (Very Simple Explanation)
An index in Pandas is like a unique identifier for each row in a DataFrame. It helps you quickly locate and access data.
When you create a DataFrame, Pandas automatically assigns an index (which is just row numbers starting from 0).

**iloc** is integer position-based.
- You use it when you want to select rows or columns by their position (like counting from 0, 1, 2, …).

Example:
- To get the first row (regardless of its label), you can use:
- df.iloc[0]

If you want to use a specific column (like "Name") as the index:
**loc** is label-based.  
You use it when you want to select rows or columns by their names (the labels in the index or column headers).

**Example:**  
- If your DataFrame has a row labeled "Alice", you can get that row with:
- df.loc["Alice"] OR 
- df_Alice = df[df['Name'] == "Alice"]


## Boolean Masking
Boolean masking is a powerful technique in Pandas that allows you to filter rows in a DataFrame based on True/False conditions.
- Filter rows based on multiple conditions (&, |, ~ for AND, OR, NOT).


In [None]:
# Example: Find all games in Group 3 for the 1950 World Cup
df_1950_group3 = df[(df['Year'] == 1950) & (df['Stage'] == 'Group 3')]
df_1950_group3

# Display only the attendance column for these filtered rows
df_1950_group3['Attendance']

## Creating and Modifying Columns
Here, we'll create a "Total Goals" column and show how to modify certain values.

In [None]:
# Create 'Total Goals' column
df['Total Goals'] = df['Home Team Goals'] + df['Away Team Goals']

# Create 'Half-time Goals' column (sum of home and away half-time goals)
df['Half-time Goals'] = df['Half-time Home Goals'] + df['Half-time Away Goals']

# Check updated columns
df[['Home Team Name','Away Team Name','Total Goals','Half-time Goals']].head()

In [None]:
# Example of modifying entries that contain 'Korea'
df.loc[df['Home Team Name'].str.contains('Korea'), 'Home Team Name'] = 'North Korea'
df.loc[df['Away Team Name'].str.contains('Korea'), 'Away Team Name'] = 'South Korea'

# Check updated entries
df.loc[df['Home Team Name'].str.contains('Korea'), ['Home Team Name']]


In [None]:
df.loc[df['Away Team Name'].str.contains('Korea'), ['Away Team Name']]

In [None]:
df_Puebla = df[df['City'] == 'Puebla ']
df_Puebla

## Summary
In this notebook, we covered:
1. Loading data with Pandas.
2. Inspecting the dataset.
3. Selecting rows and columns.
4. Boolean filtering.
5. Creating and modifying columns.

Feel free to experiment with other Pandas methods to learn more!