# HW03B

Some exercises to get started with lists, dicts and data in Python

## Goals

- Gain experience with a popular scripting language used for ML/AI projects and research
- Get familiar with Python's notation for lists and objects
- Experiment with Python's unique functionalities for processing lists and objects
- Process a dataset using Python

All of these skills are invaluable when working in any data-driven project, or any project that requires any kind of data processing or the use of ML models.

### Setup

Run the following 2 cells to import all necessary libraries and helpers for this homework

In [1]:
from HW03_utils import object_from_json_path, Tests

### Working with data files

Find the name of the 3 cities that are geographically closest to the world's most populous city.

# 🤔😱


#### Load Data:

Let's break this down into a few sub-problems.

First, let's load a JSON file that has information about large cities in the world.

The file at this [URL](https://raw.githubusercontent.com/DM-GY-9103-2024F-H/9103-utils/main/datasets/json/cities50k.json) has a list of cities formatted like this:

```py
{
  "name": "Pittsburgh",
  "country": "US",
  "admin1": "Pennsylvania",
  "lat": 40.4406200,
  "lon": -79.9958900,
  "pop": 304391
}
```

This is just like how we loaded ANSUR data files in class:

In [2]:
# Define the location of the json file here
CITIES_FILE = "./data/cities50k.json"

# Use the object_from_json_url() function to load contents from 
# the json file into a Python object called "info_cities"

info_cities = object_from_json_path(CITIES_FILE)

In [13]:
info_cities[5]

{'name': 'Bani Yas City',
 'country': 'AE',
 'admin1': 'Abū Z̧aby',
 'lat': 24.30978,
 'lon': 54.62944,
 'pop': 80498}

#### Exercise A:

Ok. We should now have a list of objects with information about cities.

Explore the data and answer the following questions:
- How many cities are in this list?
- What's the name of the first city on the list?
- What are the latitude and longitude of the last city on the list?
- What are the populations for the largest and smallest cities?
- What's the name of the city with the largest population?


In [None]:
# Work on A here


# How many cities are in the list?

num_cities = len(info_cities)
print("num_cities",num_cities)


# What's the name of the first city on the list?

first_city = info_cities[0]["name"]
print("first_city",first_city)


# What are the latitude and longitude of the last city on the list?

last_latitude = info_cities[-1]["lat"]
last_longitude = info_cities[-1]["lon"]

print("last_latitude",last_latitude)
print("last_longitude",last_longitude)


# What are the populations for the largest and smallest cities?

largest_population = 0
for i in info_cities:
    p = int(i["pop"])
    if largest_population < p:
        largest_population = p
print("largest_population",largest_population)

smallest_population = largest_population
for j in info_cities:
    p1 = int(j["pop"])
    if smallest_population > p1:
        smallest_population = p1
print("smallest_population",smallest_population)

# What's the name of the city with the largest population?
largest_city_name = ""
for i in info_cities:
    p = i["pop"]
    if largest_population == p:
        largest_city_name = i["name"]

print("name of the city with the largest population: ",largest_city_name)



num_cities 8670
first_city Abu Dhabi
last_latitude -17.88333
last_longitude 30.7
largest_population 22315474
smallest_population 50011
name of the city with the largest population:  Shanghai


#### Test A

In [12]:
# Test A
answers = [num_cities, first_city, last_latitude, last_longitude, largest_population, smallest_population, largest_city_name]

Tests.test("A", answers)

A: All tests passed 🎉🎉🎉


#### Exercise B:

We have the largest city's name and population, but we need its location.

We can recycle some of the logic from above to get the whole object that contains information for the largest city.

In [21]:
# Work on B here

largest_city = {}

for i in info_cities:
    if i["name"] == 'Shanghai':
        largest_city = i

print(largest_city)



{'name': 'Shanghai', 'country': 'CN', 'admin1': 'Shanghai Shi', 'lat': 31.22222, 'lon': 121.45806, 'pop': 22315474}


In [22]:
# Test B
Tests.test("B", largest_city)

B: All tests passed 🎉🎉🎉


#### Exercise C:

We should have all info about the largest city here.

Now, we'll iterate through the list and use each city's latitude and longitude to calculate its distance from the largest city.

Althought not $100\%$ correct, it's ok to use the [2D Euclidean distances](https://en.wikipedia.org/wiki/Euclidean_distance#Two_dimensions) for this.

Could be useful to define a function `distance(cityA, cityB)` that returns the distance between two cities.

In [29]:
# Work on C here

# Implement the helper function for calculating distances between 2 cities

import math

def distance(cityA, cityB):
  x1 = cityA['lat']
  y1 = cityA['lon']
  x2 = cityB['lat']
  y2 = cityB['lon']

  #formula 
  total = math.sqrt((x2 - x1)**2 + (y2 - y1)**2)
  return total






In [30]:
# Test C
Tests.test("C", distance)

C: All tests passed 🎉🎉🎉


#### Exercise D:

Ok. We implemented a function to calculate the distance between 2 cities, let's use it now.

Iterate through the list of cities again, calculate the distance from each city to the largest city, and add that as a new feature/key to each city's entry:

```py
{
  "name": "Pittsburgh",
  "country": "US",
  "admin1": "Pennsylvania",
  "lat": 40.4406200,
  "lon": -79.9958900,
  "pop": 304391,
  "distance_to_largest": 1222.32
}
```

Just make sure the `key` that holds the distance value is called `distance_to_largest`.

In [33]:
# Work on D here

# TODO: calculate every city's distance from the largest city 
# and add that info to each city's entry in info_cities

for cities in info_cities:
    distance_to_largest = distance(largest_city, cities)
    cities["distance_to_largest"] = distance_to_largest

print(info_cities[:5])



[{'name': 'Abu Dhabi', 'country': 'AE', 'admin1': 'Abū Z̧aby', 'lat': 24.45118, 'lon': 54.39696, 'pop': 603492, 'distance_to_largest': 67.40206314269321}, {'name': 'Ajman City', 'country': 'AE', 'admin1': '‘Ajmān', 'lat': 25.40177, 'lon': 55.47878, 'pop': 490035, 'distance_to_largest': 66.235511831048}, {'name': 'Al Ain City', 'country': 'AE', 'admin1': 'Abū Z̧aby', 'lat': 24.19167, 'lon': 55.76056, 'pop': 55091, 'distance_to_largest': 66.0726126284749}, {'name': 'Al Fujairah City', 'country': 'AE', 'admin1': 'Al Fujayrah', 'lat': 25.11641, 'lon': 56.34141, 'pop': 86512, 'distance_to_largest': 65.40228606844411}, {'name': 'Al Shamkhah City', 'country': 'AE', 'admin1': 'Abū Z̧aby', 'lat': 24.39268, 'lon': 54.70779, 'pop': 61710, 'distance_to_largest': 67.09874187855165}]


In [32]:
# Test D
Tests.test("D", info_cities)

D: All tests passed 🎉🎉🎉


#### Exercise E:

Now, sort the array from the previous step by distance and get the name of the $3$ cities closest to the largest city, but not including the largest city. In other words, if you sort the list from the exercise above by ascending `distance_to_largest`, the $3$ cities closest to the largest city will be in the slice `[1:4]`. The city at index $0$ is the city with the largest population, and should have a distance of $0$ from itself.

The answer should be a list with city names.

Something like:

```python
closest_3 = [ "pittsburgh", "liverpool", "oakland" ]
```

We saw how to sort lists of objects in lecture.

In [None]:
# Work on E here

# Sort the array and get the name and population of the 3 cities closest to the largest city

cities_by_distance = []


for city in info_cities:
    dist = distance(largest_city, city)
    cities_by_distance.append({
        'name': city["name"],
        'distance_to_largest': dist
    })
    
cities_by_distance.sort(key=lambda x: x['distance_to_largest'])

closest_3 = [city['name'] for city in cities_by_distance[1:4]]

closest_3 



[]

In [45]:
# Test E
Tests.test("E", closest_3)

E: All tests passed 🎉🎉🎉
