# Olympic Worlds
---

Olympians

Team members:
- Salman Bader Al-Haddad
- Sayed Hussain Asaad Almukhtar

## Introduction 
__Introduction to the topic__ 

The Olympic Games began in ancient Greece as contests of strength and honor. Over centuries, they grew from local traditions into today’s global event, uniting nations through sport.

Since the first modern Games in Athens 1896, the Olympics have expanded to include thousands of athletes, hundreds of disciplines, and both Summer and Winter editions.

Beyond competition, the Olympics carry cultural and economic weight: winning medals boosts national pride, strengthens global reputation, and justifies investment in sports development. This makes them a vital priority for countries aiming to build both athletic and social success.

---

## Problem Statement

    The Bahrain Olympic Committee is seeking data-driven insights to guide investment in future sports facilities and programs. Using historical Olympic data on athletes, demographics, and medal achievements, this project aims to identify which sports are most aligned with Bahrain’s regional and cultural context, as well as underserved sports where targeted investments (e.g., swimming pools, basketball courts, training facilities) could maximize Bahrain’s chances of success in upcoming Olympic Games.

## Objectives:
__Questions that will guide the analysis to solve the problem__

    ...

---

## Exploratory Data Analysis (EDA):

### Data Info:
__Getting the data and exploring it (includes descriptive statistics)__

#### Sayed Hussain

In [2]:
# import data manupilation modules
import pandas as pd
import numpy as np
import csv

# import libraries for visualization
import matplotlib.pyplot as plt
import seaborn as sns

In [11]:
#Now we will define the data path
olympic_events_path = 'Data//athlete_events.csv'
olympic_regions_path = 'Data//noc_regions.csv'

#Loading The Files
olympic_events_df = pd.read_csv(olympic_events_path)
olympic_regions_df = pd.read_csv(olympic_regions_path)

display(olympic_events_df.head())
display(olympic_regions_df.head())

Unnamed: 0,ID,Name,Sex,Age,Height,Weight,Team,NOC,Games,City,Sport,Event,Medal
0,1,A Dijiang,male,24.0,180.0cm,80.0kg,China,CHN,1992 Summer,Barcelona,Basketball,Basketball Men's Basketball,
1,2,A Lamusi,Male,23.0,170.0cm,60.0kg,China,CHN,2012 Summer,London,Judo,Judo Men's Extra-Lightweight,
2,3,Gunnar Nielsen Aaby,Male,24.0,,,Denmark,DEN,1920 Summer,Antwerpen,Football,Football Men's Football,
3,4,Edgar Lindenau Aabye,M,34.0,,,Denmark/Sweden,DEN,1900 Summer,Paris,Tug-Of-War,Tug-Of-War Men's Tug-Of-War,Gold
4,5,Christine Jacoba Aaftink,FeMale,21.0,185.0cm,82.0kg,Netherlands,NED,1988 Winter,Calgary,Speed Skating,Speed Skating Women's 500 metres,


Unnamed: 0,NOC,region,notes
0,AFG,Afghanistan,
1,AHO,Curacao,Netherlands Antilles
2,ALB,Albania,
3,ALG,Algeria,
4,AND,Andorra,


In [12]:
#Let us now check the data types

print("orders_df dtypes:\n", olympic_events_df.dtypes, "\n")
print("customers_df dtypes:\n", olympic_regions_df.dtypes, "\n")


orders_df dtypes:
 ID          int64
Name       object
Sex        object
Age       float64
Height     object
Weight     object
Team       object
NOC        object
Games      object
City       object
Sport      object
Event      object
Medal      object
dtype: object 

customers_df dtypes:
 NOC       object
region    object
notes     object
dtype: object 



In [13]:
#Now we want to transform the height into a number so we removed the String
olympic_events_df['Height'] = olympic_events_df['Height'].str.replace('cm', '')

In [14]:
#We can Check here
olympic_events_df['Height']

0         180.0
1         170.0
2           NaN
3           NaN
4         185.0
          ...  
271111    179.0
271112    176.0
271113    176.0
271114    185.0
271115    185.0
Name: Height, Length: 271116, dtype: object

In [18]:
#Now we do the same for weight
olympic_events_df['Weight'] = olympic_events_df['Weight'].str.replace('kg', '')
olympic_events_df['Weight']

0         80.0
1         60.0
2          NaN
3          NaN
4         82.0
          ... 
271111    89.0
271112    59.0
271113    59.0
271114    96.0
271115    96.0
Name: Weight, Length: 271116, dtype: object

orders_df dtypes:
 ID          int64
Name       object
Sex        object
Age       float64
Height     object
Weight     object
Team       object
NOC        object
Games      object
City       object
Sport      object
Event      object
Medal      object
dtype: object 

customers_df dtypes:
 NOC       object
region    object
notes     object
dtype: object 



In [19]:
#Now we are transforming the datatypes
olympic_events_df['Height'] = olympic_events_df['Height'].astype(float)
olympic_events_df['Weight'] = olympic_events_df['Weight'].astype(float)

In [20]:
#Let us test the changes

print("orders_df dtypes:\n", olympic_events_df.dtypes, "\n")
print("customers_df dtypes:\n", olympic_regions_df.dtypes, "\n")


orders_df dtypes:
 ID          int64
Name       object
Sex        object
Age       float64
Height    float64
Weight    float64
Team       object
NOC        object
Games      object
City       object
Sport      object
Event      object
Medal      object
dtype: object 

customers_df dtypes:
 NOC       object
region    object
notes     object
dtype: object 



In [24]:
olympic_events_df['NOC'].unique()

array(['CHN', 'DEN', 'NED', 'USA', 'FIN', 'NOR', 'ROU', 'EST', 'FRA',
       'MAR', 'ESP', 'EGY', 'IRI', 'BUL', 'ITA', 'CHA', 'AZE', 'SUD',
       'RUS', 'ARG', 'CUB', 'BLR', 'GRE', 'CMR', 'TUR', 'CHI', 'MEX',
       'URS', 'NCA', 'HUN', 'NGR', 'ALG', 'KUW', 'BRN', 'PAK', 'IRQ',
       'UAR', 'LIB', 'QAT', 'MAS', 'GER', 'CAN', 'IRL', 'AUS', 'RSA',
       'ERI', 'TAN', 'JOR', 'TUN', 'LBA', 'BEL', 'DJI', 'PLE', 'COM',
       'KAZ', 'BRU', 'IND', 'KSA', 'SYR', 'MDV', 'ETH', 'UAE', 'YAR',
       'INA', 'PHI', 'SGP', 'UZB', 'KGZ', 'TJK', 'EUN', 'JPN', 'CGO',
       'SUI', 'BRA', 'FRG', 'GDR', 'MON', 'ISR', 'URU', 'SWE', 'ISV',
       'SRI', 'ARM', 'CIV', 'KEN', 'BEN', 'UKR', 'GBR', 'GHA', 'SOM',
       'LAT', 'NIG', 'MLI', 'AFG', 'POL', 'CRC', 'PAN', 'GEO', 'SLO',
       'CRO', 'GUY', 'NZL', 'POR', 'PAR', 'ANG', 'VEN', 'COL', 'BAN',
       'PER', 'ESA', 'PUR', 'UGA', 'HON', 'ECU', 'TKM', 'MRI', 'SEY',
       'TCH', 'LUX', 'MTN', 'CZE', 'SKN', 'TTO', 'DOM', 'VIN', 'JAM',
       'LBR', 'SUR',

In [22]:
olympic_regions_df[olympic_regions_df['NOC'] == 'BRN']

Unnamed: 0,NOC,region,notes
30,BRN,Bahrain,


#### Salman

### Data Handling: 
__Cleaning, transforming, and combining data__

#### Sayed Hussain

#### Salman

### Analysis: 
__Answering the objectives through data analysis__



#### Sayed Hussain

#### Salman

---

## Summary
__Summarizing the key insights from the analysis__

**Note**: _Use Bullet Points_

    ...

## Recommendations/Conclusion
**Note**: _Use Bullet Points_

    ...