# Lab Instructions

Find a dataset that interests you. I'd recommend starting on [Kaggle](https://www.kaggle.com/). Read through all of the material about the dataset and download a .CSV file.

1. Write a short summary of the data.  Where did it come from?  How was it collected?  What are the features in the data?  Why is this dataset interesting to you?  

2. Identify 5 interesting questions about your data that you can answer using Pandas methods.  

3. Answer those questions!  You may use any method you want (including LLMs) to help you write your code; however, you should use Pandas to find the answers.  LLMs will not always write code in this way without specific instruction.  

4. Write the answer to your question in a text box underneath the code you used to calculate the answer.



# Lab Work
## Summary
### Global EV Charging Stations
This dataset contains 242,418 rows of electric vehicle charging stations worldwide from [Kaggle](https://www.kaggle.com/datasets/tarekmasryo/global-ev-charging-stations/data). The data was pulled from [Open Charge Map](https://map.openchargemap.io/#/search) through an API. The feartures I will be focusing on will be the country_code, city, ports, power_kw, and power_class coloumns. I found this dataset interesting because on of my biggest worries about electric vehicle adoption is the lack of infrastructure relative to that of traditional combustion-engine vehicles.
## Questions
1. What are the Top 10 cities with the most the charging stations?
2. What are the Top 10 cities with the most available ports?
3. What is the distribution of charging stations by power class?
4. What is the distribution of charging port power class?
5. What cities have the most Fast Chargers?

## What city has the most the charging stations?

In [12]:
import pandas as pd
df = pd.read_csv('charging_stations_2025_world.csv')

stations_per_city = (df[df['city'].notna()]
                        .groupby('city')
                        .size()
                        .sort_values(ascending=False))
    
print("Top 10 cities by number of charging stations:")
for i, (city, count) in enumerate(stations_per_city.head(10).items(), 1):
    print(f"{i:2}. {city}: {count:,} stations")

Top 10 cities by number of charging stations:
 1. London: 7,665 stations
 2. Los Angeles: 2,128 stations
 3. Hammersmith: 1,118 stations
 4. Montréal: 991 stations
 5. Berlin: 827 stations
 6. Toronto: 767 stations
 7. San Diego: 758 stations
 8. Atlanta: 681 stations
 9. Austin: 650 stations
10. Hamburg: 627 stations


Top 10 cities by number of charging stations:
 1. London: 7,665 stations
 2. Los Angeles: 2,128 stations
 3. Hammersmith: 1,118 stations
 4. Montréal: 991 stations
 5. Berlin: 827 stations
 6. Toronto: 767 stations
 7. San Diego: 758 stations
 8. Atlanta: 681 stations
 9. Austin: 650 stations
10. Hamburg: 627 stations

## What are the Top 10 cities with the most available ports?

In [11]:
ports_per_city = (df[(df['city'].notna()) & (df['ports'].notna())]
                     .groupby('city')['ports']
                     .sum()
                     .sort_values(ascending=False))
    
print("Top 10 cities by total number of ports:")
for i, (city, total_ports) in enumerate(ports_per_city.head(10).items(), 1):
    print(f"{i:2}. {city}: {total_ports:,} ports")

Top 10 cities by total number of ports:
 1. London: 9,872 ports
 2. OSLO: 3,836 ports
 3. STOCKHOLM: 3,429 ports
 4. GÖTEBORG: 3,321 ports
 5. Los Angeles: 2,269 ports
 6. Madrid: 1,871 ports
 7. Berlin: 1,354 ports
 8. Toronto: 1,192 ports
 9. Hamburg: 1,171 ports
10. Barcelona: 1,149 ports


Top 10 cities by total number of ports:
 1. London: 9,872 ports
 2. OSLO: 3,836 ports
 3. STOCKHOLM: 3,429 ports
 4. GÖTEBORG: 3,321 ports
 5. Los Angeles: 2,269 ports
 6. Madrid: 1,871 ports
 7. Berlin: 1,354 ports
 8. Toronto: 1,192 ports
 9. Hamburg: 1,171 ports
10. Barcelona: 1,149 ports

## What is the distribution of charging stations by power class?

In [13]:
power_class_counts = (df[df['power_class'].notna()]
                         .groupby('power_class')
                         .size()
                         .sort_values(ascending=False))
    
total_stations = len(df)
    
print("Charging station power class distribution:")
for i, (power_class, count) in enumerate(power_class_counts.items(), 1):
    percentage = (count / total_stations) * 100
    print(f"{i}. {power_class}: {count:,} stations ({percentage:.1f}%)")

Charging station power class distribution:
1. AC_L1_(<7.5kW): 107,144 stations (44.2%)
2. AC_HIGH_(22-49kW): 55,542 stations (22.9%)
3. DC_FAST_(50-149kW): 37,329 stations (15.4%)
4. AC_L2_(7.5-21kW): 24,238 stations (10.0%)
5. DC_ULTRA_(>=150kW): 13,504 stations (5.6%)
6. UNKNOWN: 4,660 stations (1.9%)


Charging station power class distribution:
1. AC_L1_(<7.5kW): 107,144 stations (44.2%)
2. AC_HIGH_(22-49kW): 55,542 stations (22.9%)
3. DC_FAST_(50-149kW): 37,329 stations (15.4%)
4. AC_L2_(7.5-21kW): 24,238 stations (10.0%)
5. DC_ULTRA_(>=150kW): 13,504 stations (5.6%)
6. UNKNOWN: 4,660 stations (1.9%)

## What is the distribution of charging port power class?

In [14]:
valid_data = df[(df['power_class'].notna()) & (df['ports'].notna())]
ports_by_power_class = (valid_data.groupby('power_class')['ports']
                           .sum()
                           .sort_values(ascending=False))
    
total_ports = valid_data['ports'].sum()
    
print("Port power class distribution (by total ports):")
for i, (power_class, port_count) in enumerate(ports_by_power_class.items(), 1):
    percentage = (port_count / total_ports) * 100
    print(f"{i}. {power_class}: {port_count:,} ports ({percentage:.1f}%)")

Port power class distribution (by total ports):
1. AC_L1_(<7.5kW): 149,544 ports (31.5%)
2. AC_HIGH_(22-49kW): 118,632 ports (25.0%)
3. DC_FAST_(50-149kW): 74,533 ports (15.7%)
4. DC_ULTRA_(>=150kW): 74,005 ports (15.6%)
5. AC_L2_(7.5-21kW): 48,347 ports (10.2%)
6. UNKNOWN: 9,901 ports (2.1%)


Port power class distribution (by total ports):
1. AC_L1_(<7.5kW): 149,544 ports (31.5%)
2. AC_HIGH_(22-49kW): 118,632 ports (25.0%)
3. DC_FAST_(50-149kW): 74,533 ports (15.7%)
4. DC_ULTRA_(>=150kW): 74,005 ports (15.6%)
5. AC_L2_(7.5-21kW): 48,347 ports (10.2%)
6. UNKNOWN: 9,901 ports (2.1%)


## What cities have the most Fast Chargers?

In [15]:
fast_chargers = df[
        (df['city'].notna()) & 
        ((df['is_fast_dc'] == True) | (df['is_fast_dc'] == 'True'))
    ]
    
fast_chargers_per_city = (fast_chargers.groupby('city')
                             .size()
                             .sort_values(ascending=False))
    
print("Top 10 cities by number of fast DC charging stations:")
for i, (city, count) in enumerate(fast_chargers_per_city.head(10).items(), 1):
    print(f"{i:2}. {city}: {count:,} fast chargers")

Top 10 cities by number of fast DC charging stations:
 1. London: 278 fast chargers
 2. Madrid: 130 fast chargers
 3. Hamburg: 106 fast chargers
 4. Los Angeles: 93 fast chargers
 5. Berlin: 87 fast chargers
 6. Bengaluru: 84 fast chargers
 7. Houston: 81 fast chargers
 8. San Diego: 80 fast chargers
 9. Vilnius: 77 fast chargers
10. Нижний Новгород: 74 fast chargers


Top 10 cities by number of fast DC charging stations:
 1. London: 278 fast chargers
 2. Madrid: 130 fast chargers
 3. Hamburg: 106 fast chargers
 4. Los Angeles: 93 fast chargers
 5. Berlin: 87 fast chargers
 6. Bengaluru: 84 fast chargers
 7. Houston: 81 fast chargers
 8. San Diego: 80 fast chargers
 9. Vilnius: 77 fast chargers
10. Нижний Новгород: 74 fast chargers