# Belongs to Airbnb Lab

### Introduction
In this lab we will continue to explore the relationships between data in different tables of a database. The Airbnb database for this lab contains four tables, `hosts`, `listings`, `locations`, and `neighborhoods`. In order to understand and analyze the data, we need to first understand the relationships between the tables. Relationships include "Has One" and "Has Many". For example, the `listings` table has a column "host_id" which HAS ONE record in hosts table that it corresponds to (a listing will only have one host). The `locations` table has an id column which HAS MANY corresponding records in the `listings` table (a location will have more than one listing). 

Let's begin by connecting to the database and reviewing the schema of the tables.

### Loading Data

In [None]:
import pandas as pd
neighborhoods_url = "https://raw.githubusercontent.com/sql-fundamentals-jigsaw/mod-1-sql-curriculum/master/2-sql-relations/3-belongs-to-bnb/data/neighborhoods.csv"
hosts_url = "https://raw.githubusercontent.com/sql-fundamentals-jigsaw/mod-1-sql-curriculum/master/2-sql-relations/3-belongs-to-bnb/data/hosts.csv"
locations_url = "https://raw.githubusercontent.com/sql-fundamentals-jigsaw/mod-1-sql-curriculum/master/2-sql-relations/3-belongs-to-bnb/data/locations.csv"
listings_url = "https://raw.githubusercontent.com/sql-fundamentals-jigsaw/mod-1-sql-curriculum/master/2-sql-relations/3-belongs-to-bnb/data/listings.csv"


hosts_df = pd.read_csv(hosts_url)
neighborhoods_df = pd.read_csv(neighborhoods_url)

locations_df = pd.read_csv(locations_url)
listings_df = pd.read_csv(listings_url)

In [None]:
import sqlite3
conn = sqlite3.connect('listings.db')
cursor = conn.cursor()

In [None]:
listings_df = pd.read_sql('select * from listings', conn)

In [None]:
hosts_df.to_sql('hosts',conn, index = False)
neighborhoods_df.to_sql('neighborhoods',conn, index = False)
locations_df.to_sql('locations',conn, index = False)
listings_df.to_sql('listings', conn, index = False)

### Exploring Data

In [None]:
cursor.execute('SELECT name from sqlite_master where type= "table"')
cursor.fetchall()

[('hosts',), ('neighborhoods',), ('locations',), ('listings',)]

In [None]:
cursor.execute('PRAGMA table_info(hosts)')
cursor.fetchall()

[(0, 'id', 'INTEGER', 0, None, 0), (1, 'host_name', 'TEXT', 0, None, 0)]

In [None]:
cursor.execute('PRAGMA table_info(neighborhoods)')
cursor.fetchall()

[(0, 'id', 'INTEGER', 0, None, 0),
 (1, 'name', 'TEXT', 0, None, 0),
 (2, 'neighbourhood_group', 'TEXT', 0, None, 0)]

In [None]:
cursor.execute('PRAGMA table_info(locations)')
cursor.fetchall()

[(0, 'id', 'INTEGER', 0, None, 0),
 (1, 'longitude', 'REAL', 0, None, 0),
 (2, 'latitude', 'REAL', 0, None, 0),
 (3, 'neighborhood_id', 'INTEGER', 0, None, 0)]

In [None]:
cursor.execute('PRAGMA table_info(listings)')
cursor.fetchall()

We'll start off with some basic one table queries:

* Which listing name has the highest price?

In [None]:
cursor.execute("select name from listings order by price desc limit 1")
cursor.fetchall()
# [('Furnished room in Astoria apartment',)]

* What is the id of the location with the lowest longitude?

In [None]:
cursor.execute("select id from locations order by longitude asc limit 1")
cursor.fetchall()

# [(45652,)]

* What is the greatest occupancy of a listing?

In [None]:
cursor.execute("select occupancy from listings order by occupancy desc limit 1")
cursor.fetchall()

# [(365,)]

* What is the average price of a listing?

In [None]:
cursor.execute("select avg(price) from listings")
cursor.fetchall()

# [(152.7206871868289,)]

* What is the count of number of hosts?

In [None]:
cursor.execute("select count(host_name) from hosts ")
cursor.fetchall()
# [(37457,)]

### Relationships
To help us better understand the relationships, create queries below that JOIN the tables. 

### JOINs

For the following queries, use the relationships between the tables to find the solutions

* What is the longitude and latitude of the listing of the highest price?

In [None]:
cursor.execute("""select longitude, latitude 
from locations 
join listings on locations.id = listings.location_id  
order by price desc
 limit 1 """)
cursor.fetchall()

# [(-73.91651, 40.7681)]

* What is the neighborhood id of the listing with the lowest price?

In [None]:
cursor.execute("""select neighborhoods.id  from neighborhoods 
join locations on locations.neighborhood_id = neighborhoods.id 
join listings on listings.location_id = locations.id 
order by price asc limit 1  """)
cursor.fetchall()
# [(6,)]

* What is the longitude and latitude of the listing of the lowest price?

In [None]:
cursor.execute("""select longitude, latitude from locations  
join listings on listings.location_id = locations.id 
order by price asc limit 1  """)
cursor.fetchall()

# [(-73.95428000000001, 40.69023)]

### Relations and GROUP BY

* What is the name of the host has the most number of reviews?

In [None]:
cursor.execute("""select host_name, sum(number_of_reviews) from hosts  
join listings on listings.host_id = hosts.id
group by host_id
order by sum(number_of_reviews) desc  limit  1""")
cursor.fetchall()

# [('Maya', 2273)]

* What is the name of the host with the highest average listing price?

In [None]:
cursor.execute(""" select host_name, avg(price) from hosts 
join listings on listings.host_id = hosts.id
group by listings.host_id
order by avg(price) desc 
limit 5
""")
cursor.fetchall()

# [('Jelena',)]

* What is the name of the host with the lowest average listing price?

In [None]:
cursor.execute(""" select host_name, avg(price) from hosts 
join listings on listings.host_id = hosts.id
group by listings.host_id
order by avg(price) asc 
limit 1
""")
cursor.fetchall()

# [('Aymeric',)]

* What is the name of the neighborhood with the most number of locations

In [None]:
cursor.execute(""" select neighborhoods.name from neighborhoods

join locations on locations.neighborhood_id = neighborhoods.id
join listings on listings.location_id = locations.id
group by neighborhoods.id
order by count(neighborhoods.id) desc
limit 10""")
cursor.fetchall()

# [('Williamsburg',)]

* What are the names of the neighborhoods with 10 locations?

In [None]:
cursor.execute(""" select neighborhoods.name from neighborhoods

join locations on locations.neighborhood_id = neighborhoods.id
join listings on listings.location_id = locations.id
group by neighborhoods.id
having  count(neighborhoods.id) =10 
limit 10""")
cursor.fetchall()

# [('North Riverdale',),
#  ('Great Kills',),
#  ('East Morrisania',),neighbourhood_group
#  ('Melrose',),
#  ('Bergen Beach',),
#  ('Westchester Square',)]

The following questions will require joins of three tables

* What is the average occupancy of each neighborhood (limit to the first five results)?

In [None]:
cursor.execute(""" select neighborhoods.name, avg(occupancy) from neighborhoods
join locations on locations.neighborhood_id = neighborhoods.id
join listings on listings.location_id = locations.id

group by neighborhoods.id
limit 5
""")

cursor.fetchall()

# [('Kensington', 281.0514285714286),
#  ('Midtown', 207.29644012944985),
#  ('Harlem', 258.4224981188864),
#  ('Clinton Hill', 269.986013986014),
#  ('East Harlem', 266.0268576544315)]

[('Kensington', 281.0514285714286),
 ('Midtown', 207.29644012944985),
 ('Harlem', 258.4224981188864),
 ('Clinton Hill', 269.986013986014),
 ('East Harlem', 266.0268576544315)]

* What is the total number of reviews for each neighborhood (limit to the first five results)?

In [None]:
cursor.execute(""" select neighborhoods.name, sum(number_of_reviews) from neighborhoods
join locations on locations.neighborhood_id = neighborhoods.id
join listings on listings.location_id = locations.id

group by neighborhoods.id
limit 5
""")

cursor.fetchall()

# [('Kensington', 2972),
#  ('Midtown', 19444),
#  ('Harlem', 75962),
#  ('Clinton Hill', 14586),
#  ('East Harlem', 36446)]

[('Kensington', 2972),
 ('Midtown', 19444),
 ('Harlem', 75962),
 ('Clinton Hill', 14586),
 ('East Harlem', 36446)]

* Write a query that returns the name and average listing price of each neighborhood (limit to the first five results)

In [None]:
cursor.execute(""" select neighborhoods.name, avg(price) from neighborhoods
join locations on locations.neighborhood_id = neighborhoods.id
join listings on listings.location_id = locations.id

group by neighborhoods.id
limit 5
""")

cursor.fetchall()


# [('Kensington', 92.88571428571429),
#  ('Midtown', 282.7190938511327),
#  ('Harlem', 118.97404063205417),
#  ('Clinton Hill', 181.89335664335664),
#  ('East Harlem', 133.1987466427932)]

[('Kensington', 92.88571428571429),
 ('Midtown', 282.7190938511327),
 ('Harlem', 118.97404063205417),
 ('Clinton Hill', 181.89335664335664),
 ('East Harlem', 133.1987466427932)]

### Conclusion
In this lab we worked on the "Has One" and "Has Many" relations in SQL. We began by mapping out the relations between the tables, which gave us a better idea of how we could then join them in our queries. We finished the lab by creating queries using JOIN clauses that connect the tables using these relationships.