In [1]:
import pandas as pd 
import numpy as np 
import seaborn as sns

# # Capstone 2 Kickoff
​
The week of Monday, August 3rd will be devoted to executing, and presenting your own four-minute capstone slide presentation using work done in Pandas and/or Tableau.
​
# Data
Your data can be found here: (AirBnB Data) [http://insideairbnb.com/get-the-data.html]
​
You can analyze any single city in the dataset. Please let me know which city you want to do so that I can monitor usage. If you want to analyze more than one city, please slack me a proposal for approval. 
​
Note: Some of these files are gzip-compressed csvs. Google and/or the pandas docs can help you figure out what to do with them.
​
​
## Schedule
​
The rough process for this is:
* Monday:
    * Identify your interests
    * Locate related data sources
* Tuesday:
    * Explore your data sources
    * Think of questions you could answer with the data
    * Scope your questions/point of view to be answerable in a few days
    * This might take more than one try!
* Wednesday:
    * Make your first draft presentation (4 minutes only)
* Thursday:
    * Deliver first draft presentation. Revise as needed.
* Friday:
    * Deliver final presentation (4 minutes only) presentation.
​
​
## Learning objectives: What you owe yourself
​
By the end of the week you should have increased your skill in:
* Obtaining data sources
* Converting your interests + data sources -> a specific question
* Scoping a project to fit within a certain amount of time
  * (No one is good at this)
* Attending daily check-in meetings
* Presenting by yourself on a deadline
* Note: Tableau Dashboards are a nice to have, but not required for Capstone 2!
​
This is not something we do for you. This is something that you do for yourself, by way of achieving the deliverables.
​
## Presentation Criteria:
* Your presentation should study the data from a point of view. For example, are you a host, are you a government thinking about adding regulation, are you a business thinking of competing against AirBnB in your target city, etc.
​
* Produce a 3-4 page presentation, excluding cover page,conclusion page, and an optional, clearly-marked appendix.
* Maximum of 3 graphs per page 
* Maximum ~50 words of text per slide
* Final presentation should take a MAXIMUM of four minutes
* Any final presentations above the max four (4) page limit and any final presentation that doesn’t meet the other criteria listed above will be rejected!!!!!
​
## Deliverables: What you owe us
​
* PRE-WORK - Friday 7/31
* MAKE A NEW DIRECTORY and git repo called `YOURNAME-CAPSTONE2` and subdir called `YOURNAME-CAPSTONE2/DATA` inside .
* Download dataset(s) and Put all the Capstone 2 DATA files in YOURNAME-CAPSTONE2/DATA.
* Examine the csv files. What should the data types of each column be if the CSV were converted into Pandas?
* Upload your CSV files into pandas
* Explore the data! 
   1. Is the data clean?
   1. Are there values/columns that you should drop?
* Output your data from pandas to a csv. 
* Make one simple visualization using seaborn/matpolotlib/Tableau and send the URL to the class channel.
* Make a repo on github and push your files.
​
​
* Day 1 - Monday 
  * Monday Morning: Do more visualizations
  * Monday Afternoon: Based on your visualizations and think about 2-5 topics you might like to explore, and describe them in a few sentences.  
  Present it at checkin.
​
* Day 2 - Tuesday
  * Morning Stand Up: 
  Say, in 60 secs or less apiece:
      1. What 2-5 topics might you explore for this capstone?
      1. what did you do yesterday?
      1. what are you going to do today? 
      1. What is your biggest challenge?
      1. Where do you need help?
  * Continue work on pandas analysis and your visualizations that address your topics of interest for the rest of the day.
  * Come up with a point of view!
​
​
* Day 3 - Wednesday
  * Morning Stand Up: 
  Say, in 60 secs or less apiece:
      1. what did you do yesterday?
      1. what are you going to do today? 
      1. What is your biggest challenge?
      1. Where do you need help?
  * 11AM **DEADLINE** to finish your visualizations. Make them look pretty as you can 
  * Prepare your slides and talking points for your draft four-minute presentation to be delivered on Thursday to the class.
  
* Day 4 - Thursday
  * Morning Stand Up: 
  Say, in 60 secs or less apiece:
      1. what did you do yesterday?
      1. what are you going to do today? 
      1. What is your biggest challenge?
      1. Where do you need help?
  * 10am **DEADLINE** FINISH YOUR FIRST PRESENTATION DRAFT BY 10am
  * 10am: Give dry run/ practice presentation to your peers and staff in small private breakout rooms 
  * Listen to feedback from staff and your peers on how to improve your presentation.
  * Cut your presentation since it is no doubt too long! 
​
​
* Day 5 - Friday - Presentation Day!
  * Morning: Give Final presentation to class.
  * Afternoon: Relax, it's done!
​
* Sometime AFTER Day 5 - Capstone Diary Entry
  * Prepare a written document with more details on what you did and how you did it during your solo week, solely for yourself. Write about your feelings during the process and more details about your struggle. No one else will see this but you. It is your second Capstone diary entry. Due date: NEVER
  * Write a good `README.md` file in the main directory, based on your diary and your official presentation, that describes what you've done and the technologies you've used to do it. If someone looks at the project, this might be all they see. Make it clear.
  
### Example Presentation Format:
1. Project description
  * What is it?
  * Why is it interesting?
1. Data acquisition
  * How did you do it? Any tricks?
1. Exploratory data analysis
  * Three or four graphs
  * Make sure to include titles and axis labels
1. Conclusion and next steps.
  * What did you learn about your data?
  * What did you learn about data analysis?
  * What advice would you give yourself?
  * Do you want to continue with this topic?


In [3]:
df = pd.read_csv('listings.csv')

In [4]:
df

Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365
0,2060,Modern NYC,2259,Jenny,Manhattan,Washington Heights,40.857220,-73.937900,Private room,100,1,1,2008-09-22,0.01,1,365
1,2595,Skylit Midtown Castle,2845,Jennifer,Manhattan,Midtown,40.753620,-73.983770,Entire home/apt,225,3,48,2019-11-04,0.37,2,335
2,3831,"Whole flr w/private bdrm, bath & kitchen(pls r...",4869,LisaRoxanne,Brooklyn,Clinton Hill,40.685140,-73.959760,Entire home/apt,89,1,322,2020-06-07,4.64,1,276
3,5099,Large Cozy 1 BR Apartment In Midtown East,7322,Chris,Manhattan,Murray Hill,40.747670,-73.975000,Entire home/apt,200,3,78,2019-10-13,0.58,1,0
4,5121,BlissArtsSpace!,7356,Garon,Brooklyn,Bedford-Stuyvesant,40.686880,-73.955960,Private room,60,29,50,2019-12-02,0.37,1,365
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
49525,43702714,Spacious and Luxurious Noho 3 Bedroom Apartment,18880232,Byron,Manhattan,East Village,40.729553,-73.990712,Entire home/apt,173,3,0,,,1,365
49526,43702765,2-bedroom entire place for you in Bushwick,804056,Philip,Brooklyn,Bushwick,40.697820,-73.913960,Entire home/apt,99,3,0,,,1,135
49527,43703128,"Simple Spacious Manhattan Room for 2Near 2,3 T...",137358866,Kaz,Manhattan,Harlem,40.812614,-73.942075,Private room,49,28,0,,,160,8
49528,43703156,Cosy Room Available in East Village,26846438,Marine,Manhattan,Civic Center,40.713753,-74.005350,Private room,94,2,0,,,1,12


In [7]:
df.info()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 49530 entries, 0 to 49529
Data columns (total 16 columns):
 #   Column                          Non-Null Count  Dtype  
---  ------                          --------------  -----  
 0   id                              49530 non-null  int64  
 1   name                            49512 non-null  object 
 2   host_id                         49530 non-null  int64  
 3   host_name                       49524 non-null  object 
 4   neighbourhood_group             49530 non-null  object 
 5   neighbourhood                   49530 non-null  object 
 6   latitude                        49530 non-null  float64
 7   longitude                       49530 non-null  float64
 8   room_type                       49530 non-null  object 
 9   price                           49530 non-null  int64  
 10  minimum_nights                  49530 non-null  int64  
 11  number_of_reviews               49530 non-null  int64  
 12  last_review                     

In [8]:
df.describe()

Unnamed: 0,id,host_id,latitude,longitude,price,minimum_nights,number_of_reviews,reviews_per_month,calculated_host_listings_count,availability_365
count,49530.0,49530.0,49530.0,49530.0,49530.0,49530.0,49530.0,38211.0,49530.0,49530.0
mean,22959640.0,85099510.0,40.729238,-73.951042,162.643872,8.19154,23.867515,1.008095,6.23303,126.666848
std,13526830.0,98875970.0,0.054674,0.047547,419.312316,21.974833,48.245823,1.345213,25.485293,142.381428
min,2060.0,2259.0,40.49979,-74.24084,0.0,1.0,0.0,0.01,1.0,0.0
25%,10850500.0,9269052.0,40.68982,-73.983367,68.0,2.0,1.0,0.15,1.0,0.0
50%,22336020.0,38004830.0,40.72384,-73.95535,101.0,3.0,5.0,0.45,1.0,79.0
75%,35577790.0,137358900.0,40.76279,-73.93429,175.0,6.0,23.0,1.42,2.0,267.0
max,43703360.0,349082600.0,40.91169,-73.71299,10000.0,1250.0,746.0,53.8,280.0,365.0


In [11]:
import gzip
df2 = pd.read_csv('listings.csv.gz')
df2

  interactivity=interactivity, compiler=compiler, result=result)


Unnamed: 0,id,listing_url,scrape_id,last_scraped,name,summary,space,description,experiences_offered,neighborhood_overview,...,instant_bookable,is_business_travel_ready,cancellation_policy,require_guest_profile_picture,require_guest_phone_verification,calculated_host_listings_count,calculated_host_listings_count_entire_homes,calculated_host_listings_count_private_rooms,calculated_host_listings_count_shared_rooms,reviews_per_month
0,2060,https://www.airbnb.com/rooms/2060,20200608144437,2020-06-09,Modern NYC,,"Lovely, spacious, sunny 1 BR apartment in 6th ...","Lovely, spacious, sunny 1 BR apartment in 6th ...",none,,...,f,f,flexible,f,f,1,0,1,0,0.01
1,2595,https://www.airbnb.com/rooms/2595,20200608144437,2020-06-09,Skylit Midtown Castle,"Beautiful, spacious skylit studio in the heart...","- Spacious (500+ft²), immaculate and nicely fu...","Beautiful, spacious skylit studio in the heart...",none,Centrally located in the heart of Manhattan ju...,...,f,f,strict_14_with_grace_period,t,t,2,2,0,0,0.37
2,3831,https://www.airbnb.com/rooms/3831,20200608144437,2020-06-09,"Whole flr w/private bdrm, bath & kitchen(pls r...","Enjoy 500 s.f. top floor in 1899 brownstone, w...",We host on the entire top floor of our double-...,"Enjoy 500 s.f. top floor in 1899 brownstone, w...",none,Just the right mix of urban center and local n...,...,f,f,flexible,f,f,1,1,0,0,4.64
3,5099,https://www.airbnb.com/rooms/5099,20200608144437,2020-06-09,Large Cozy 1 BR Apartment In Midtown East,My large 1 bedroom apartment has a true New Yo...,I have a large 1 bedroom apartment centrally l...,My large 1 bedroom apartment has a true New Yo...,none,My neighborhood in Midtown East is called Murr...,...,f,f,moderate,t,t,1,1,0,0,0.58
4,5121,https://www.airbnb.com/rooms/5121,20200608144437,2020-06-09,BlissArtsSpace!,,HELLO EVERYONE AND THANKS FOR VISITING BLISS A...,HELLO EVERYONE AND THANKS FOR VISITING BLISS A...,none,,...,f,f,strict_14_with_grace_period,f,f,1,0,1,0,0.37
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
49525,43702714,https://www.airbnb.com/rooms/43702714,20200608144437,2020-06-09,Spacious and Luxurious Noho 3 Bedroom Apartment,Beautiful spacious NoHo apartment. Minutes awa...,,Beautiful spacious NoHo apartment. Minutes awa...,none,,...,t,f,flexible,f,f,1,1,0,0,
49526,43702765,https://www.airbnb.com/rooms/43702765,20200608144437,2020-06-09,2-bedroom entire place for you in Bushwick,"2-bedroom in Bushwick, Brooklyn, 1000 sqf, 2 b...",,"2-bedroom in Bushwick, Brooklyn, 1000 sqf, 2 b...",none,,...,f,f,flexible,f,f,1,1,0,0,
49527,43703128,https://www.airbnb.com/rooms/43703128,20200608144437,2020-06-09,"Simple Spacious Manhattan Room for 2Near 2,3 T...",This is an affordable master bedroom for 2 in ...,*************************************** Curren...,This is an affordable master bedroom for 2 in ...,none,Harlem is one of the largest areas in Manhatta...,...,f,f,moderate,f,f,160,4,156,0,
49528,43703156,https://www.airbnb.com/rooms/43703156,20200608144437,2020-06-09,Cosy Room Available in East Village,"10 minutes from L train and F Train Coffee, re...",,"10 minutes from L train and F Train Coffee, re...",none,,...,f,f,flexible,f,f,1,0,1,0,


In [14]:
df.to_csv('df.csv')
df2.to_csv('df2.csv')