# Get Cities of The World Quality of Life Data

In this exercise, we will try to get scoring information about the quality of for different cities around the world. 

For this exercise, we will be using the following API : 

* [Teleport](https://developers.teleport.org/api/getting_started/)

We will also need to use a website called RandomList.com that will give us a random cities around the world to get a scoring. 

Then we will store the data we got into an S3 Bucket ! 

🥵🥵 Quite a project, right? 🥵🥵 

🥰🥰 You'll learn a lot during this exercise 🥰🥰 

So let's go 💪💪💪


## Part 1 : Get data for 1 City 

To simplify this exercise, let's start by trying to scrape data for only 1 city : Paris. In another part, we'll try to get scores for 100 different cities 

* First, need to import a library called `requests` 

In [11]:
import requests

* Check teleport's API, to find a way to search information on Paris. Especially, we would need its `geonameid`

  * Here is the link for the documentation 👉👉👉 [Teleport API](https://developers.teleport.org/api/getting_started/)

In [12]:
response = requests.get(" https://api.teleport.org/api/cities/?search=Paris")
response

<Response [200]>

ℹ️ℹ️You should get the following result ℹ️ℹ️

In [13]:
response.json()

{'_embedded': {'city:search-results': [{'_links': {'city:item': {'href': 'https://api.teleport.org/api/cities/geonameid:2988507/'}},
    'matching_alternate_names': [{'name': 'Paris'},
     {'name': 'paris'},
     {'name': 'Parisi'}],
    'matching_full_name': 'Paris, Île-de-France, France'},
   {'_links': {'city:item': {'href': 'https://api.teleport.org/api/cities/geonameid:4717560/'}},
    'matching_alternate_names': [{'name': 'Paris'}],
    'matching_full_name': 'Paris, Texas, United States'},
   {'_links': {'city:item': {'href': 'https://api.teleport.org/api/cities/geonameid:3489854/'}},
    'matching_alternate_names': [],
    'matching_full_name': 'Kingston, Kingston, Jamaica'},
   {'_links': {'city:item': {'href': 'https://api.teleport.org/api/cities/geonameid:966166/'}},
    'matching_alternate_names': [{'name': 'Paris'}],
    'matching_full_name': 'Parys, Orange Free State, South Africa (Paris)'},
   {'_links': {'city:item': {'href': 'https://api.teleport.org/api/cities/geoname

* Now that you got the a list of search results, try to isolate Paris' `geonameid`

In [14]:
first_result = response.json()["_embedded"]["city:search-results"][0]
paris_link = first_result["_links"]["city:item"]["href"]
paris_link

'https://api.teleport.org/api/cities/geonameid:2988507/'

* Use `requests` to get information about Paris 

In [15]:
paris_info = requests.get(paris_link)
paris_info.json()

{'_links': {'city:admin1_division': {'href': 'https://api.teleport.org/api/countries/iso_alpha2:FR/admin1_divisions/geonames:11/',
   'name': 'Île-de-France'},
  'city:alternate-names': {'href': 'https://api.teleport.org/api/cities/geonameid:2988507/alternate_names/'},
  'city:country': {'href': 'https://api.teleport.org/api/countries/iso_alpha2:FR/',
   'name': 'France'},
  'city:timezone': {'href': 'https://api.teleport.org/api/timezones/iana:Europe%2FParis/',
   'name': 'Europe/Paris'},
  'city:urban_area': {'href': 'https://api.teleport.org/api/urban_areas/slug:paris/',
   'name': 'Paris'},
  'curies': [{'href': 'https://developers.teleport.org/api/resources/Location/#!/relations/{rel}/',
    'name': 'location',
    'templated': True},
   {'href': 'https://developers.teleport.org/api/resources/City/#!/relations/{rel}/',
    'name': 'city',
    'templated': True},
   {'href': 'https://developers.teleport.org/api/resources/UrbanArea/#!/relations/{rel}/',
    'name': 'ua',
    'templa

* You should now be able to get Paris' quality of life scores 

In [16]:
paris_scores = requests.get(paris_info.json()["_links"]["city:urban_area"]["href"]+"scores/")
paris_scores

<Response [200]>

In [17]:
paris_scores.json()

{'_links': {'curies': [{'href': 'https://developers.teleport.org/api/resources/Location/#!/relations/{rel}/',
    'name': 'location',
    'templated': True},
   {'href': 'https://developers.teleport.org/api/resources/City/#!/relations/{rel}/',
    'name': 'city',
    'templated': True},
   {'href': 'https://developers.teleport.org/api/resources/UrbanArea/#!/relations/{rel}/',
    'name': 'ua',
    'templated': True},
   {'href': 'https://developers.teleport.org/api/resources/Country/#!/relations/{rel}/',
    'name': 'country',
    'templated': True},
   {'href': 'https://developers.teleport.org/api/resources/Admin1Division/#!/relations/{rel}/',
    'name': 'a1',
    'templated': True},
   {'href': 'https://developers.teleport.org/api/resources/Timezone/#!/relations/{rel}/',
    'name': 'tz',
    'templated': True}],
  'self': {'href': 'https://api.teleport.org/api/urban_areas/slug:paris/scores/'}},
 'categories': [{'color': '#f3c32c',
   'name': 'Housing',
   'score_out_of_10': 3.5835}

* Use `Pandas` to create a DataFrame where you'll get all the scores for Paris 

In [18]:
import pandas as pd 
paris_df = pd.DataFrame(paris_scores.json()["categories"])
paris_df.head()

Unnamed: 0,color,name,score_out_of_10
0,#f3c32c,Housing,3.5835
1,#f3d630,Cost of Living,3.664
2,#f4eb33,Startups,9.2765
3,#d2ed31,Venture Capital,7.513
4,#7adc29,Travel Connectivity,10.0


* We now need to upload this DataFrame to S3. Let's first create a Boto3 session 
  * For the following, refer to the following documentation 👉👉👉 [Boto3](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html)

In [19]:
import boto3
session = boto3.Session(aws_access_key_id="YOUR_ACCESS_KEY_ID", 
                        aws_secret_access_key="YOUR_SECRET_ACCESS_KEY")

* Now create a resource session 

In [20]:
s3 = session.resource("s3")

* Create a Bucket that you'll call `scoring-cities-in-the-world`

In [None]:
bucket_name =s3.create_bucket(Bucket="scoring-cities-in-the-world-132")

* Use `Pandas` to export your DataFrame as a csv file

In [None]:
csv = paris_df.to_csv()

* Use `put_object()` function to create an Object within the bucket you just created 

In [None]:
put_object = bucket_name.put_object(Key="paris-scoring.csv", Body=csv)

## Get Data For Several Cities 

😉 Congrats ! 😉 You made it to the second part of the exercise. We now need more data to be able to compare them later. Let's try to find a way to get data for a lot more cities 

* Go on to [this Wikipedia page](https://en.wikipedia.org/wiki/List_of_largest_cities). There you'll find a list of the world's largest cities.
  * Use `scrapy` to scrape the city names directly from this page 😎

In [26]:
import os
import logging

import scrapy
from scrapy.crawler import CrawlerProcess

In [30]:
class CityNamesSpider(scrapy.Spider):

    name = "citynames"

    start_urls = [
        'https://en.wikipedia.org/wiki/List_of_largest_cities',
    ]

    def parse(self, response):
        cities = response.css("tr")
        for city in cities[6:87]:
            yield {
                'city': city.css('a::text').get()
            }

In [32]:
filename = "cities.json"

if filename in os.listdir('res/'):
        os.remove('res/' + filename)

process = CrawlerProcess(settings = {
    'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)',
    'LOG_LEVEL': logging.INFO,
    "FEEDS": {
        'res/' + filename : {"format": "json"},
    }
})

process.crawl(CityNamesSpider)
process.start()

2020-12-17 18:59:11 [scrapy.utils.log] INFO: Scrapy 2.4.1 started (bot: scrapybot)
2020-12-17 18:59:11 [scrapy.utils.log] INFO: Versions: lxml 4.6.2.0, libxml2 2.9.10, cssselect 1.1.0, parsel 1.5.2, w3lib 1.21.0, Twisted 20.3.0, Python 3.8.5 (default, Sep  4 2020, 02:22:02) - [Clang 10.0.0 ], pyOpenSSL 20.0.1 (OpenSSL 1.1.1i  8 Dec 2020), cryptography 3.3.1, Platform macOS-10.16-x86_64-i386-64bit
2020-12-17 18:59:11 [scrapy.crawler] INFO: Overridden settings:
{'LOG_LEVEL': 20,
 'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)'}
2020-12-17 18:59:11 [scrapy.extensions.telnet] INFO: Telnet Password: c2a410264bffbda8
2020-12-17 18:59:11 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
 'scrapy.extensions.telnet.TelnetConsole',
 'scrapy.extensions.memusage.MemoryUsage',
 'scrapy.extensions.feedexport.FeedExporter',
 'scrapy.extensions.logstats.LogStats']
2020-12-17 18:59:11 [scrapy.middleware] INFO: Enabled downloader middlewares:
['sc

ReactorNotRestartable: 

* Read the json file with results from the crawling :

In [33]:
city_names = pd.read_json("res/cities.json")
city_names.head()

ValueError: Expected object or value

* Finally, create a loop that will go through each city, search for information and store it to your S3 bucket 
  * You might get some errors, definitely use the `try: \ except:` structure 
  * (It's totally fine if you couldn't get info for all cities) 😌😌

In [None]:
for city in city_names['city']:
    try:
        search_city = requests.get(" https://api.teleport.org/api/cities/?search={}".format(city))
        first_result = search_city.json()["_embedded"]["city:search-results"][0]
        city_link = first_result["_links"]["city:item"]["href"]
        city_info = requests.get(city_link)
        city_scores = requests.get(city_info.json()["_links"]["city:urban_area"]["href"]+"scores/")
        city_df = pd.DataFrame(city_scores.json()["categories"])
        csv = city_df.to_csv()
        put_object = bucket_name.put_object(Key="{}-scoring.csv".format(city), Body=csv)
        print("{} done".format(city))
    except:
        print("Couldn't find results for {}".format(city))

Couldn't find results for Tokyo
Couldn't find results for Delhi
Couldn't find results for Shanghai
Couldn't find results for São Paulo
Couldn't find results for Mexico City
Couldn't find results for Cairo
Couldn't find results for Mumbai
Couldn't find results for Beijing
Couldn't find results for Dhaka
Couldn't find results for Osaka
Couldn't find results for New York City
Couldn't find results for Karachi
Couldn't find results for Buenos Aires
Couldn't find results for Chongqing
Couldn't find results for Istanbul
Couldn't find results for Kolkata
Couldn't find results for Manila
Couldn't find results for Lagos
Couldn't find results for Rio de Janeiro
Couldn't find results for Tianjin
Couldn't find results for Kinshasa
Couldn't find results for Guangzhou
Couldn't find results for Los Angeles
Couldn't find results for Moscow
Couldn't find results for Shenzhen
Couldn't find results for Lahore
Couldn't find results for Bangalore
Couldn't find results for Paris
Couldn't find results for Bo

🎊🎊🎊 Congratulations, You made it to the end of this exercise !! 🎊🎊🎊🎊