# Belongs to Airbnb Lab

### Introduction
In this lab we will continue to explore the relationships between data in different tables of a database. The Airbnb database for this lab contains four tables, `hosts`, `listings`, `locations`, and `neighborhoods`. In order to understand and analyze the data, we need to first understand the relationships between the tables. Relationships include "Has One" and "Has Many". For example, a `listing` has one host. And a `location` has many listings. 

Ok, now let's begin by connecting to the database and reviewing the schema of the tables.

### Loading and Exploring Data

The data is already loaded into the sqlite database.  However if another copy is needed it is also available in the `/data` folder.

In [1]:
import sqlite3
conn = sqlite3.connect('./airbnb.db')
cursor = conn.cursor()

In [2]:
import pandas as pd
root_url = "https://raw.githubusercontent.com/data-eng-10-21/belongs-to-bnb/master/data/"
names = ['hosts', 'neighborhoods', 'locations', 'listings']
loaded_dfs = [pd.read_csv(f'{root_url}{name}.csv') for name in names]


In [3]:
for index, name in enumerate(names):
    loaded_dfs[index].to_sql(f'{name}', conn, index = False, if_exists = 'replace')

In [4]:
cursor.execute('SELECT name from sqlite_master where type= "table"')
cursor.fetchall()

[('hosts',), ('neighborhoods',), ('locations',), ('listings',)]

In [5]:
cursor.execute('PRAGMA table_info(hosts)')
cursor.fetchall()

[(0, 'id', 'INTEGER', 0, None, 0), (1, 'host_name', 'TEXT', 0, None, 0)]

In [6]:
cursor.execute('PRAGMA table_info(neighborhoods)')
cursor.fetchall()

[(0, 'id', 'INTEGER', 0, None, 0),
 (1, 'name', 'TEXT', 0, None, 0),
 (2, 'neighbourhood_group', 'TEXT', 0, None, 0)]

In [7]:
cursor.execute('PRAGMA table_info(locations)')
cursor.fetchall()

[(0, 'id', 'INTEGER', 0, None, 0),
 (1, 'longitude', 'REAL', 0, None, 0),
 (2, 'latitude', 'REAL', 0, None, 0),
 (3, 'neighborhood_id', 'INTEGER', 0, None, 0)]

In [8]:
cursor.execute('PRAGMA table_info(listings)')
cursor.fetchall()

[(0, 'id', 'INTEGER', 0, None, 0),
 (1, 'name', 'TEXT', 0, None, 0),
 (2, 'host_id', 'INTEGER', 0, None, 0),
 (3, 'location_id', 'INTEGER', 0, None, 0),
 (4, 'number_of_reviews', 'INTEGER', 0, None, 0),
 (5, 'occupancy', 'INTEGER', 0, None, 0),
 (6, 'price', 'INTEGER', 0, None, 0),
 (7, 'room_type', 'TEXT', 0, None, 0),
 (8, 'host_listings_count', 'INTEGER', 0, None, 0)]

We'll start off with some basic one table queries:

* Which listing name has the highest price?

In [10]:
query = """
        SELECT name, MAX(price)
        FROM listings 
"""

cursor.execute(query)
cursor.fetchall()
# [('M', 682.0)]

[('Furnished room in Astoria apartment', 10000)]

* What is the id of the location with the lowest longitude?

In [13]:
query = """
        SELECT locations.id, MIN(longitude)
        FROM locations 
"""

cursor.execute(query)
cursor.fetchall()
# [('M', 682.0)]

[(45652, -74.24441999999999)]

* What is the greatest occupancy of a listing?

In [18]:
query = """
        SELECT name, occupancy
        FROM listings 
        ORDER BY occupancy DESC
        LIMIT 1
"""

cursor.execute(query)
cursor.fetchall()
# [('M', 682.0)]

[('Entire Apt: Spacious Studio/Loft by central park', 365)]

In [19]:
query = """
        SELECT name, MAX(occupancy)
        FROM listings 
"""

cursor.execute(query)
cursor.fetchall()
# [('M', 682.0)]

[('Entire Apt: Spacious Studio/Loft by central park', 365)]

* What is the average price of a listing?

In [21]:
query = """
        SELECT AVG(price)
        FROM listings 
        """

cursor.execute(query)
cursor.fetchall()
# [('M', 682.0)]

[(152.7206871868289,)]

* What is the count of number of hosts?

In [22]:
query = """
        SELECT COUNT(id)
        FROM hosts 
        """

cursor.execute(query)
cursor.fetchall()
# [('M', 682.0)]

[(37457,)]

### Relationships
To help us better understand the relationships, create queries below that JOIN the tables.  Before doing so, write our the relationships between each of the entities.  

> Remember that if there is a foreign key on a table, this means that the entity has one of the other table.  For example, the `neighborhood_id` on `location` means that a location has one neighborhood.

### JOINs

For the following queries, use the relationships between the tables to find the solutions

* What is the longitude and latitude of the listing of the highest price?

In [66]:
query = """
        SELECT longitude, latitude, MAX(price) as highprice
        FROM locations 
        JOIN listings ON listings.location_id = locations.id
        """
cursor.execute(query)
cursor.fetchall()
# [('M', 682.0)]

[(-73.91651, 40.7681, 10000)]

* What is the neighborhood id of the listing with the lowest price?

In [67]:
query = """
        SELECT neighborhood_id, MIN(price) as lowprice
        FROM locations 
        JOIN listings ON listings.location_id = locations.id
        """
cursor.execute(query)
cursor.fetchall()
# [('M', 682.0)]

[(6, 0)]

* What is the longitude and latitude of the listing of the lowest price?

In [64]:
query = """
        SELECT longitude, latitude, MIN(price) as lowprice
        FROM locations 
        JOIN listings ON listings.location_id = locations.id
        """
cursor.execute(query)
cursor.fetchall()
# [('M', 682.0)]

[(-73.95428000000001, 40.69023, 0)]

### Relations and GROUP BY

* What is the name of the host has the most number of reviews?

In [68]:
query = """
        SELECT host_name, SUM(number_of_reviews) total_reviwes
        FROM hosts 
        JOIN listings ON listings.host_id = hosts.id
        GROUP BY host_name
        ORDER BY total_reviwes DESC  
        LIMIT 1
        """
cursor.execute(query)
cursor.fetchall()

[('Michael', 11081)]

* What is the name of the host with the highest average listing price?

In [53]:
query = """
        SELECT host_name, avg(price) as avg_price
        FROM hosts 
        JOIN listings ON listings.host_id = hosts.id
        GROUP BY host_name
        ORDER BY avg_price DESC
        LIMIT 1
        """
cursor.execute(query)
cursor.fetchall()

[('Olson', 9999.0)]

* What is the name of the host with the lowest average listing price?

In [51]:
query = """
        SELECT host_name, avg(price) as avg_price
        FROM hosts 
        JOIN listings ON listings.host_id = hosts.id
        GROUP BY host_name
        ORDER BY avg_price ASC
        LIMIT 1
        """
cursor.execute(query)
cursor.fetchall()

[('Aymeric', 0.0)]

* What is the name of the neighborhood with the most number of locations

In [44]:
query = """
        SELECT neighborhoods.name, count(locations.id) as total_locations
        FROM neighborhoods 
        JOIN locations ON locations.neighborhood_id = neighborhoods.id
        GROUP BY neighborhoods.name
        ORDER BY total_locations DESC
        LIMIT 1
        """
cursor.execute(query)
cursor.fetchall()

[('Williamsburg', 3914)]

* What are the names of the neighborhoods with 10 locations?

In [45]:
query = """
        SELECT neighborhoods.name, count(locations.id) as total_locations
        FROM neighborhoods 
        JOIN locations ON locations.neighborhood_id = neighborhoods.id
        GROUP BY neighborhoods.name
        HAVING total_locations = 10
        """
cursor.execute(query)
cursor.fetchall()

[('Bergen Beach', 10),
 ('East Morrisania', 10),
 ('Great Kills', 10),
 ('Melrose', 10),
 ('North Riverdale', 10),
 ('Westchester Square', 10)]

The following questions will require joins of three tables

* What is the average occupancy of each neighborhood?

In [69]:
query = """
        SELECT n.name, avg(lis.occupancy) as avg_occupancy
        FROM neighborhoods n 
        JOIN locations loc ON loc.neighborhood_id = n.id
        JOIN listings lis ON lis.location_id = loc.id
        GROUP BY n.name
        ORDER BY avg_occupancy DESC
        """
cursor.execute(query)
cursor.fetchall()

[('Bay Terrace, Staten Island', 365.0),
 ('New Dorp', 365.0),
 ('Woodrow', 365.0),
 ('Downtown Brooklyn', 325.51807228915663),
 ('Morningside Heights', 321.9450867052023),
 ('Navy Yard', 316.07142857142856),
 ('Rossville', 306.0),
 ('Cobble Hill', 301.7878787878788),
 ('Stuyvesant Town', 299.8918918918919),
 ('New Springville', 299.75),
 ('Sea Gate', 299.42857142857144),
 ('Brooklyn Heights', 298.961038961039),
 ('Civic Center', 296.7307692307692),
 ('Columbia St', 295.9047619047619),
 ('Nolita', 295.1897233201581),
 ('Carroll Gardens', 293.3304721030043),
 ('Roosevelt Island', 292.4155844155844),
 ('East Village', 290.3761467889908),
 ('Williamsburg', 290.2772959183674),
 ('Boerum Hill', 287.728813559322),
 ('Prospect Heights', 287.484593837535),
 ('Greenpoint', 285.92286995515695),
 ('Vinegar Hill', 285.47058823529414),
 ('Emerson Hill', 284.2),
 ('Little Neck', 283.8),
 ('Windsor Terrace', 283.11464968152865),
 ('Greenwich Village', 282.94897959183675),
 ('South Slope', 281.23943661

* What is the total number of reviews for each neighborhood?

In [49]:
query = """
        SELECT neighborhoods.name, sum(number_of_reviews) as total_reviews
        FROM neighborhoods 
        JOIN locations ON locations.neighborhood_id = neighborhoods.id
        JOIN listings ON listings.location_id = locations.id
        GROUP BY neighborhoods.name
        ORDER BY total_reviews DESC
        """
cursor.execute(query)
cursor.fetchall()

[('Bedford-Stuyvesant', 110352),
 ('Williamsburg', 85427),
 ('Harlem', 75962),
 ('Bushwick', 52514),
 ("Hell's Kitchen", 50227),
 ('East Village', 44670),
 ('East Harlem', 36446),
 ('Crown Heights', 36408),
 ('Upper West Side', 36058),
 ('Upper East Side', 31686),
 ('Lower East Side', 24161),
 ('Chelsea', 23641),
 ('Midtown', 19444),
 ('Greenpoint', 19429),
 ('Astoria', 19310),
 ('Washington Heights', 17161),
 ('East Elmhurst', 15107),
 ('West Village', 14885),
 ('Flushing', 14818),
 ('Park Slope', 14638),
 ('Clinton Hill', 14586),
 ('Prospect-Lefferts Gardens', 14051),
 ('Flatbush', 12787),
 ('East Flatbush', 12448),
 ('Long Island City', 12256),
 ('Prospect Heights', 10875),
 ('Fort Greene', 10608),
 ('South Slope', 10405),
 ('Chinatown', 9941),
 ('Jamaica', 9910),
 ('Sunnyside', 8070),
 ('Sunset Park', 7882),
 ('Ditmars Steinway', 7852),
 ('Ridgewood', 7778),
 ('Gowanus', 7709),
 ('Gramercy', 7682),
 ('SoHo', 7235),
 ('Financial District', 6931),
 ('East New York', 6759),
 ('Greenwi

* Write a query that returns the name and average listing price of each neighborhood

In [50]:
query = """
        SELECT neighborhoods.name, avg(price) as avg_price
        FROM neighborhoods 
        JOIN locations ON locations.neighborhood_id = neighborhoods.id
        JOIN listings ON listings.location_id = locations.id
        GROUP BY neighborhoods.name
        ORDER BY avg_price DESC
        """
cursor.execute(query)
cursor.fetchall()

[('Fort Wadsworth', 800.0),
 ('Woodrow', 700.0),
 ('Tribeca', 490.638418079096),
 ('Sea Gate', 487.85714285714283),
 ('Riverdale', 442.09090909090907),
 ("Prince's Bay", 409.5),
 ('Battery Park City', 367.5571428571429),
 ('Flatiron District', 341.925),
 ('Randall Manor', 336.0),
 ('NoHo', 295.71794871794873),
 ('SoHo', 287.1033519553073),
 ('Midtown', 282.7190938511327),
 ('Neponsit', 274.6666666666667),
 ('West Village', 267.6822916666667),
 ('Greenwich Village', 263.40561224489795),
 ('Chelsea', 249.73854447439354),
 ('Willowbrook', 249.0),
 ('Theater District', 248.01388888888889),
 ('Nolita', 230.13833992094862),
 ('Financial District', 225.49059139784947),
 ('Gramercy', 222.75443786982248),
 ('Little Italy', 222.06611570247935),
 ('Murray Hill', 220.95876288659792),
 ('Breezy Point', 213.33333333333334),
 ('Cobble Hill', 211.92929292929293),
 ('Upper West Side', 210.91831557584982),
 ('Brooklyn Heights', 209.06493506493507),
 ("Hell's Kitchen", 204.79417773237998),
 ('Kips Bay', 

### Conclusion
In this lab we worked on the "Has One" and "Has Many" relations in SQL. We began by mapping out the relations between the tables, which gave us a better idea of how we could then join them in our queries. We finished the lab by creating queries using JOIN clauses that connect the tables using these relationships.

### Resources

* [SAT Results](https://data.cityofnewyork.us/Education/2012-SAT-Results/f9bf-2cp4)
* [School District Breakdown](https://data.cityofnewyork.us/Education/School-District-Breakdowns/g3vh-kbnw)