# Week05 - Hands on with Pandas

In [62]:
import pandas as pd

## Loading the data
This structure remains the same. If you have another google spreadsheet, you only have to change the sheet_id

In [22]:
base_url = "https://docs.google.com/spreadsheets/d/"
url_id = "1NV-k5237esNE6lugwYw4hZVoRcYi8SFtG4fzX3aokAo/"
export = "export/format=excel"

whole_url = base_url + url_id + export

Different python and pandas versions exist and they handle dataframes differently regarding specific cases. For example, this dataset contains many values as "None". In my setup, all "None" values are instantly converted as NaN (missing valueas). So it appears the same as in my setup, you need to specify <code>na_values="None"</code>

In [63]:
df = pd.read_excel(whole_url, na_values="None")

### Description of the dataset

This is a dataset of classified for apartments for rent in USA.
source: https://archive.ics.uci.edu/dataset/555/apartment+for+rent+classified

- **id** — unique identifier of the apartment
- **category** — category of the classified
- **title** — title text of the apartment
- **body** — body text of the apartment
- **amenities** — features like AC, basketball court, cable, gym, internet access, pool, refrigerator, etc.
- **bathrooms** — number of bathrooms
- **bedrooms** — number of bedrooms
- **currency** — currency of the price
- **fee** — fee associated with the apartment
- **has_photo** — whether the apartment has a photo
- **pets_allowed** — allowed pets (dogs, cats, etc.)
- **price** — rental price of the apartment
- **price_display** — formatted price shown to the reader
- **price_type** — price expressed in USD
- **square_feet** — size of the apartment
- **address** — address of the apartment
- **cityname** — city where the apartment is located
- **state** — state where the apartment is located
- **latitude** — geographic latitude of the apartment
- **longitude** — geographic longitude of the apartment
- **source** — origin of the classified
- **time** — timestamp when the classified was created

## Inspecting the dataset
All analysis begin by asking questions.

### What is the structure of my dataframe?

In [64]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 22 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   id             10000 non-null  int64  
 1   category       10000 non-null  object 
 2   title          10000 non-null  object 
 3   body           10000 non-null  object 
 4   amenities      6451 non-null   object 
 5   bathrooms      9966 non-null   float64
 6   bedrooms       9993 non-null   float64
 7   currency       10000 non-null  object 
 8   fee            10000 non-null  object 
 9   has_photo      10000 non-null  object 
 10  pets_allowed   5837 non-null   object 
 11  price          10000 non-null  int64  
 12  price_display  10000 non-null  object 
 13  price_type     10000 non-null  object 
 14  square_feet    10000 non-null  int64  
 15  address        6673 non-null   object 
 16  cityname       9923 non-null   object 
 17  state          9923 non-null   object 
 18  latitud

### How can I visualize the first ten columns?
Make sure to set <code>pd.set_option('display.max_columns', None)</code>

In [65]:
pd.set_option('display.max_columns', None)
df.head(5)

Unnamed: 0,id,category,title,body,amenities,bathrooms,bedrooms,currency,fee,has_photo,pets_allowed,price,price_display,price_type,square_feet,address,cityname,state,latitude,longitude,source,time
0,5668626895,housing/rent/apartment,"Studio apartment 2nd St NE, Uhland Terrace NE,...","This unit is located at second St NE, Uhland T...",,,0.0,USD,No,Thumbnail,,790,790,Monthly,101,,Washington,DC,38.9057,-76.9861,RentLingo,1577359415
1,5664597177,housing/rent/apartment,Studio apartment 814 Schutte Road,"This unit is located at 814 Schutte Road, Evan...",,,1.0,USD,No,Thumbnail,,425,425,Monthly,106,814 Schutte Rd,Evansville,IN,37.968,-87.6621,RentLingo,1577017063
2,5668626833,housing/rent/apartment,"Studio apartment N Scott St, 14th St N, Arling...","This unit is located at N Scott St, 14th St N,...",,1.0,0.0,USD,No,Thumbnail,,1390,1390,Monthly,107,,Arlington,VA,38.891,-77.0816,RentLingo,1577359410
3,5659918074,housing/rent/apartment,Studio apartment 1717 12th Ave,"This unit is located at 1717 12th Ave, Seattle...",,1.0,0.0,USD,No,Thumbnail,,925,925,Monthly,116,1717 12th Avenue,Seattle,WA,47.616,-122.3275,RentLingo,1576667743
4,5668626759,housing/rent/apartment,"Studio apartment Washington Blvd, N Cleveland ...","This unit is located at Washington Blvd, N Cle...",,,0.0,USD,No,Thumbnail,,880,880,Monthly,125,,Arlington,VA,38.8738,-77.1055,RentLingo,1577359401


### How can I get the names of all columns?

In [66]:
df.columns

Index(['id', 'category', 'title', 'body', 'amenities', 'bathrooms', 'bedrooms',
       'currency', 'fee', 'has_photo', 'pets_allowed', 'price',
       'price_display', 'price_type', 'square_feet', 'address', 'cityname',
       'state', 'latitude', 'longitude', 'source', 'time'],
      dtype='object')

### How do I know quickly the number of rows and columns of my df?

In [67]:
df.shape

(10000, 22)

## Index and Slicing

### How do I select a column?

In [69]:
df["pets_allowed"]

0       NaN
1       NaN
2       NaN
3       NaN
4       NaN
       ... 
9995    NaN
9996    NaN
9997    NaN
9998    NaN
9999    NaN
Name: pets_allowed, Length: 10000, dtype: object

In [68]:
#use double square brackets to be shown with df format
df[["pets_allowed"]]

Unnamed: 0,pets_allowed
0,
1,
2,
3,
4,
...,...
9995,
9996,
9997,
9998,


### How do I know the unique values in one column?

In [70]:
df["pets_allowed"].unique()

array([nan, 'Cats,Dogs', 'Cats', 'Dogs'], dtype=object)

### How do I know how many unique values there are in one column?

In [71]:
df["pets_allowed"].nunique(dropna=True)

3

### I want to know the specific counts of the unique values.

In [72]:
df["pets_allowed"].value_counts(dropna=False)

pets_allowed
Cats,Dogs    5228
NaN          4163
Cats          485
Dogs          124
Name: count, dtype: int64

### How do I select only the second row by index position?

In [75]:
df.iloc[2]

id                                                      5668626833
category                                    housing/rent/apartment
title            Studio apartment N Scott St, 14th St N, Arling...
body             This unit is located at N Scott St, 14th St N,...
amenities                                                      NaN
bathrooms                                                      1.0
bedrooms                                                       0.0
currency                                                       USD
fee                                                             No
has_photo                                                Thumbnail
pets_allowed                                                   NaN
price                                                         1390
price_display                                                 1390
price_type                                                 Monthly
square_feet                                                   

In [76]:
# or use double square brackets
df.iloc[[2]]

Unnamed: 0,id,category,title,body,amenities,bathrooms,bedrooms,currency,fee,has_photo,pets_allowed,price,price_display,price_type,square_feet,address,cityname,state,latitude,longitude,source,time
2,5668626833,housing/rent/apartment,"Studio apartment N Scott St, 14th St N, Arling...","This unit is located at N Scott St, 14th St N,...",,1.0,0.0,USD,No,Thumbnail,,1390,1390,Monthly,107,,Arlington,VA,38.891,-77.0816,RentLingo,1577359410


### How do I select the first 8 rows with all columns?

In [78]:
df.iloc[:8,:]

Unnamed: 0,id,category,title,body,amenities,bathrooms,bedrooms,currency,fee,has_photo,pets_allowed,price,price_display,price_type,square_feet,address,cityname,state,latitude,longitude,source,time
0,5668626895,housing/rent/apartment,"Studio apartment 2nd St NE, Uhland Terrace NE,...","This unit is located at second St NE, Uhland T...",,,0.0,USD,No,Thumbnail,,790,790,Monthly,101,,Washington,DC,38.9057,-76.9861,RentLingo,1577359415
1,5664597177,housing/rent/apartment,Studio apartment 814 Schutte Road,"This unit is located at 814 Schutte Road, Evan...",,,1.0,USD,No,Thumbnail,,425,425,Monthly,106,814 Schutte Rd,Evansville,IN,37.968,-87.6621,RentLingo,1577017063
2,5668626833,housing/rent/apartment,"Studio apartment N Scott St, 14th St N, Arling...","This unit is located at N Scott St, 14th St N,...",,1.0,0.0,USD,No,Thumbnail,,1390,1390,Monthly,107,,Arlington,VA,38.891,-77.0816,RentLingo,1577359410
3,5659918074,housing/rent/apartment,Studio apartment 1717 12th Ave,"This unit is located at 1717 12th Ave, Seattle...",,1.0,0.0,USD,No,Thumbnail,,925,925,Monthly,116,1717 12th Avenue,Seattle,WA,47.616,-122.3275,RentLingo,1576667743
4,5668626759,housing/rent/apartment,"Studio apartment Washington Blvd, N Cleveland ...","This unit is located at Washington Blvd, N Cle...",,,0.0,USD,No,Thumbnail,,880,880,Monthly,125,,Arlington,VA,38.8738,-77.1055,RentLingo,1577359401
5,5667891676,housing/rent/apartment,0 BR in New York NY 10019,**RARE GEM WITH PRIVATE OUTDOOR TERRACE****AVA...,"Dishwasher,Elevator,Patio/Deck,Pool,Storage",1.0,0.0,USD,No,Thumbnail,,2475,2475,Monthly,130,350 West 50th St,Manhattan,NY,40.7629,-73.9885,Listanza,1577289784
6,5668627426,housing/rent/apartment,Studio apartment 2432 Penmar Ave,"This unit is located at 2432 Penmar Ave, Venic...",,,0.0,USD,No,Thumbnail,,1800,1800,Monthly,132,2432 Penmar Avenue,Venice,CA,33.9932,-118.4609,RentLingo,1577359461
7,5668626687,housing/rent/apartment,"Studio apartment Oak St NW, 16th St NW, Washin...","This unit is located at Oak St NW, 16th St NW,...",,,0.0,USD,No,Thumbnail,,840,840,Monthly,136,,Washington,DC,38.9328,-77.0297,RentLingo,1577359393


### How do I select the first 10 rows with the first five columns?

In [79]:
df.iloc[:10, :5]

Unnamed: 0,id,category,title,body,amenities
0,5668626895,housing/rent/apartment,"Studio apartment 2nd St NE, Uhland Terrace NE,...","This unit is located at second St NE, Uhland T...",
1,5664597177,housing/rent/apartment,Studio apartment 814 Schutte Road,"This unit is located at 814 Schutte Road, Evan...",
2,5668626833,housing/rent/apartment,"Studio apartment N Scott St, 14th St N, Arling...","This unit is located at N Scott St, 14th St N,...",
3,5659918074,housing/rent/apartment,Studio apartment 1717 12th Ave,"This unit is located at 1717 12th Ave, Seattle...",
4,5668626759,housing/rent/apartment,"Studio apartment Washington Blvd, N Cleveland ...","This unit is located at Washington Blvd, N Cle...",
5,5667891676,housing/rent/apartment,0 BR in New York NY 10019,**RARE GEM WITH PRIVATE OUTDOOR TERRACE****AVA...,"Dishwasher,Elevator,Patio/Deck,Pool,Storage"
6,5668627426,housing/rent/apartment,Studio apartment 2432 Penmar Ave,"This unit is located at 2432 Penmar Ave, Venic...",
7,5668626687,housing/rent/apartment,"Studio apartment Oak St NW, 16th St NW, Washin...","This unit is located at Oak St NW, 16th St NW,...",
8,5668610290,housing/rent/apartment,Studio apartment 333 Hyde St,"This unit is located at 333 Hyde St, San Franc...",Refrigerator
9,5668627023,housing/rent/apartment,"Studio apartment A St SE, 19th St SE, Washington","This unit is located at A St SE, 19th St SE, W...",


### Can I select rows by position and columns by name

In [80]:
#selecting rows by label and index
df.loc[5:25, ["cityname", "state"]]

Unnamed: 0,cityname,state
5,Manhattan,NY
6,Venice,CA
7,Washington,DC
8,San Francisco,CA
9,Washington,DC
10,Washington,DC
11,Washington,DC
12,Tucson,AZ
13,Washington,DC
14,San Francisco,CA


## Conditional selection

In class we learned that we can select portion of the dataframe based on specific conditions

- Only one condition
<code>df[df[condition]]</code>

- Two conditions that must be fulfilled
<code>df[df[condition1 & condition2]]</code>


- Either one condition or the other
<code>df[df[condition1 | condition2]]</code>


Using <code>df[df[condition]]</code> is useful for quick views. But usually, it is always recommended to use <code>df.loc[df[condition]]</code>. 

### I want to get only those aparments who allow only Cats

In [83]:
df.loc[df["pets_allowed"]=="Cats"]

Unnamed: 0,id,category,title,body,amenities,bathrooms,bedrooms,currency,fee,has_photo,pets_allowed,price,price_display,price_type,square_feet,address,cityname,state,latitude,longitude,source,time
40,5668625234,housing/rent/apartment,Studio apartment 420 W. Fullerton Pkwy,"This unit is located at 420 W. Fullerton Pkwy,...","Cable or Satellite,Dishwasher,Internet Access,...",1.0,1.0,USD,No,Thumbnail,Cats,942,942,Monthly,225,420 W Fullerton Parkway,Chicago,IL,41.8625,-87.6825,RentLingo,1577359301
45,5668639708,housing/rent/apartment,Studio apartment 1008 North 109th Street,This unit is located at 1008 North 109th Stree...,,1.0,0.0,USD,No,Thumbnail,Cats,1095,1095,Monthly,231,1008 North 109th St,Seattle,WA,47.6160,-122.3275,RentLingo,1577360334
48,5668636830,housing/rent/apartment,Studio apartment 202 E. HOLLY STREET,"This unit is located at 202 E. HOLLY STREET, B...",,1.0,1.0,USD,No,Thumbnail,Cats,775,775,Monthly,240,202 E Holly St,Bellingham,WA,48.7871,-122.4437,RentLingo,1577360135
185,5668616388,housing/rent/apartment,Studio apartment 404 Chamberlain Ave,"This unit is located at 404 Chamberlain Ave, M...","Parking,Storage",1.0,1.0,USD,No,Thumbnail,Cats,710,710,Monthly,265,404 Chamberlain Avenue,Madison,WI,43.0724,-89.4003,RentLingo,1577358725
189,5664583637,housing/rent/apartment,Studio apartment 621 Begonia Avenue,"This unit is located at 621 Begonia Avenue, Co...",,1.0,0.0,USD,No,Thumbnail,Cats,1900,1900,Monthly,275,621 Begonia Ave,Corona Del Mar,CA,33.6019,-117.8637,RentLingo,1577016114
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9828,5664592650,housing/rent/apartment,Two BR 703 N 13th St. #308,"This unit is located at 703 N 13th St. #308, S...",,2.0,2.0,USD,No,Thumbnail,Cats,1995,1995,Monthly,2539,703 N 13th St #308,Saint Louis,MO,38.6274,-90.3040,RentLingo,1577016704
9837,5664582953,housing/rent/apartment,Two BR 2215 Monument Ave.,"This unit is located at 2215 Monument Ave., Ri...",,2.0,2.0,USD,No,Thumbnail,Cats,2800,2800,Monthly,2600,2215 Monument Avenue,Richmond,VA,37.5300,-77.4770,RentLingo,1577016049
9857,5668627365,housing/rent/apartment,Four BR 22 Gray St,"This unit is located at 22 Gray St, Arlington,...",,3.5,4.0,USD,No,Thumbnail,Cats,4500,4500,Monthly,2700,22 Gray St,Arlington,MA,42.4180,-71.1669,RentLingo,1577359455
9905,5659919613,housing/rent/apartment,Five BR 14815 SE River Forest Dr.,This unit is located at 14815 SE River Forest ...,,2.0,5.0,USD,No,Thumbnail,Cats,3895,3895,Monthly,3000,14815 SE River Forest Drive,Milwaukie,OR,45.4248,-122.6129,RentLingo,1576667875


### I want to get only aparments in Austin and who allow only Dogs

In [84]:
condition1 = df["cityname"]=="Austin"
condition2 = df["pets_allowed"]=="Dogs"

df.loc[condition1&condition2]

Unnamed: 0,id,category,title,body,amenities,bathrooms,bedrooms,currency,fee,has_photo,pets_allowed,price,price_display,price_type,square_feet,address,cityname,state,latitude,longitude,source,time
431,5668633423,housing/rent/apartment,Studio apartment 2104 San Gabriel St,"This unit is located at 2104 San Gabriel St, A...",,1.0,0.0,USD,No,Thumbnail,Dogs,925,925,Monthly,400,2104 San Gabriel St,Austin,TX,30.3054,-97.7497,RentLingo,1577359911
557,5664594452,housing/rent/apartment,One BR 2108 San Gabriel St,"This unit is located at 2108 San Gabriel St, A...",Alarm,1.0,1.0,USD,No,Thumbnail,Dogs,925,925,Monthly,425,2108 San Gabriel St,Austin,TX,30.3054,-97.7497,RentLingo,1577016823


In [85]:
df.loc[(df["cityname"]=="Los Angeles")&(df["pets_allowed"]=="Cats")]

Unnamed: 0,id,category,title,body,amenities,bathrooms,bedrooms,currency,fee,has_photo,pets_allowed,price,price_display,price_type,square_feet,address,cityname,state,latitude,longitude,source,time
3171,5668627803,housing/rent/apartment,One BR 1724 Butler Ave.,"This unit is located at 1724 Butler Ave., Los ...",,1.0,1.0,USD,No,Thumbnail,Cats,2095,2095,Monthly,690,1724 Butler Avenue,Los Angeles,CA,34.0372,-118.2972,RentLingo,1577359492
3297,5508828320,housing/rent/apartment,"Bright Los Angeles, One BR, One BA for rent","Square footage: 700 square feet, unit number: ...","AC,Cable or Satellite,Clubhouse,Dishwasher,Ele...",1.0,1.0,USD,No,Thumbnail,Cats,2325,2325,Monthly,700,,Los Angeles,CA,33.955,-118.3967,RentDigs.com,1568755678
3302,5664597241,housing/rent/apartment,One BR 3646 Mentone Ave.,"This unit is located at 3646 Mentone Ave., Los...",,1.0,1.0,USD,No,Thumbnail,Cats,2095,2095,Monthly,700,3646 Mentone Avenue,Los Angeles,CA,34.0372,-118.2972,RentLingo,1577017069
3799,5508832745,housing/rent/apartment,One BR Apartment - Choose from a spacious studio.,One or two beds floorplan Manor Apartments. Un...,"AC,Cable or Satellite,Dishwasher,Elevator,Fire...",1.0,1.0,USD,No,Thumbnail,Cats,2135,2135,Monthly,725,,Los Angeles,CA,34.1056,-118.3668,RentDigs.com,1568755968
4119,5508742208,housing/rent/apartment,Apartment in move in condition in Los Angeles,"Square footage: 750 sq-ft, unit number: 117. S...","AC,Fireplace,Gym,Parking,Pool",1.0,1.0,USD,No,Thumbnail,Cats,3800,3800,Monthly,750,,Los Angeles,CA,34.063,-118.4363,RentDigs.com,1568749134
4122,5508817208,housing/rent/apartment,Welcome home to Sepulveda West Apartments!,Experience a sense of beautiful apartment comm...,"AC,Clubhouse,Gym,Pool,Refrigerator",1.0,1.0,USD,No,Thumbnail,Cats,2595,2595,Monthly,750,,Los Angeles,CA,33.955,-118.3967,RentDigs.com,1568754936
4125,5648134653,housing/rent/apartment,"One BR 9005, 9015 Burton Way","This unit is located at 9005, 9015 Burton Way,...",,1.0,1.0,USD,No,Thumbnail,Cats,2350,2350,Monthly,750,9005 9015 Burton Way,Los Angeles,CA,34.0372,-118.2972,RentLingo,1575978067
5340,5668638506,housing/rent/apartment,Two BR 3810 Wade St.,"This unit is located at 3810 Wade St., Los Ang...",,2.0,2.0,USD,No,Thumbnail,Cats,2695,2695,Monthly,840,3810 Wade St,Los Angeles,CA,34.0372,-118.2972,RentLingo,1577360242
5433,5668638079,housing/rent/apartment,Two BR 3724-36 Inglewood Blvd.,This unit is located at 3724-36 Inglewood Blvd...,,1.0,2.0,USD,No,Thumbnail,Cats,2795,2795,Monthly,850,3724-36 Inglewood Boulevard,Los Angeles,CA,34.0372,-118.2972,RentLingo,1577360200
5874,5659916336,housing/rent/apartment,Two BR 1339 Barry Ave.,"This unit is located at 1339 Barry Ave., Los A...",,2.0,2.0,USD,No,Thumbnail,Cats,2995,2995,Monthly,895,1339 Barry Avenue,Los Angeles,CA,34.0372,-118.2972,RentLingo,1576667598


### How much is the most expensive apartment?

In [44]:
df["price"].max()

np.int64(52500)

### I want to get the row of the most expensive aparment

In [86]:
df.loc[df["price"]==52500]

Unnamed: 0,id,category,title,body,amenities,bathrooms,bedrooms,currency,fee,has_photo,pets_allowed,price,price_display,price_type,square_feet,address,cityname,state,latitude,longitude,source,time
8829,5666447277,housing/rent/apartment,Studio apartment for rent,Barstow It's 14/18ft. studio apartment furnish...,"AC,Cable or Satellite,Internet Access,Patio/De...",1.0,0.0,USD,No,Thumbnail,,52500,52500,Monthly,1418,1101 Pueblo Drive,Barstow,CA,34.887,-117.035,RentDigs.com,1577185712


In [87]:
#alternative
df.loc[df["price"]==df["price"].max()]

Unnamed: 0,id,category,title,body,amenities,bathrooms,bedrooms,currency,fee,has_photo,pets_allowed,price,price_display,price_type,square_feet,address,cityname,state,latitude,longitude,source,time
8829,5666447277,housing/rent/apartment,Studio apartment for rent,Barstow It's 14/18ft. studio apartment furnish...,"AC,Cable or Satellite,Internet Access,Patio/De...",1.0,0.0,USD,No,Thumbnail,,52500,52500,Monthly,1418,1101 Pueblo Drive,Barstow,CA,34.887,-117.035,RentDigs.com,1577185712


### I want only those apartments that costs less or equal to 1000

In [88]:
df.loc[df["price"]<=1000]

Unnamed: 0,id,category,title,body,amenities,bathrooms,bedrooms,currency,fee,has_photo,pets_allowed,price,price_display,price_type,square_feet,address,cityname,state,latitude,longitude,source,time
0,5668626895,housing/rent/apartment,"Studio apartment 2nd St NE, Uhland Terrace NE,...","This unit is located at second St NE, Uhland T...",,,0.0,USD,No,Thumbnail,,790,790,Monthly,101,,Washington,DC,38.9057,-76.9861,RentLingo,1577359415
1,5664597177,housing/rent/apartment,Studio apartment 814 Schutte Road,"This unit is located at 814 Schutte Road, Evan...",,,1.0,USD,No,Thumbnail,,425,425,Monthly,106,814 Schutte Rd,Evansville,IN,37.9680,-87.6621,RentLingo,1577017063
3,5659918074,housing/rent/apartment,Studio apartment 1717 12th Ave,"This unit is located at 1717 12th Ave, Seattle...",,1.0,0.0,USD,No,Thumbnail,,925,925,Monthly,116,1717 12th Avenue,Seattle,WA,47.6160,-122.3275,RentLingo,1576667743
4,5668626759,housing/rent/apartment,"Studio apartment Washington Blvd, N Cleveland ...","This unit is located at Washington Blvd, N Cle...",,,0.0,USD,No,Thumbnail,,880,880,Monthly,125,,Arlington,VA,38.8738,-77.1055,RentLingo,1577359401
7,5668626687,housing/rent/apartment,"Studio apartment Oak St NW, 16th St NW, Washin...","This unit is located at Oak St NW, 16th St NW,...",,,0.0,USD,No,Thumbnail,,840,840,Monthly,136,,Washington,DC,38.9328,-77.0297,RentLingo,1577359393
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9547,5664595590,housing/rent/apartment,Four BR 21 Idylwood Drive,"This unit is located at 21 Idylwood Drive, Cul...",,4.0,4.0,USD,No,Thumbnail,"Cats,Dogs",575,575,Monthly,1950,21 Idylwood Dr,Cullowhee,NC,35.2850,-83.1791,RentLingo,1577016906
9557,5650235161,housing/rent/apartment,217 Binford SW Dr,Come visit Spring Branch apartments today and ...,"Dishwasher,Refrigerator",2.0,4.0,USD,No,Thumbnail,,665,665,Monthly,1968,217 Binford SW Drive,Huntsville,AL,34.7191,-86.5855,ListedBuy,1576100663
9710,5668616581,housing/rent/apartment,Four BR 11 Maverick Dr,"This unit is located at eleven Maverick Dr, Pe...","Dishwasher,Garbage Disposal,Parking,Patio/Deck...",3.0,4.0,USD,No,Thumbnail,"Cats,Dogs",430,430,Monthly,2214,11 Maverick Drive,Pendleton,SC,34.6464,-82.7686,RentLingo,1577358737
9713,5509096590,housing/rent/apartment,Two BR - Come home to this roomy 2nd floor apa...,"Dedicated dining area room, Galley Kitchen equ...","Refrigerator,Storage,Washer Dryer",1.0,2.0,USD,No,Yes,,595,595,Monthly,2216,,Dayton,OH,39.7942,-84.2701,RentDigs.com,1568773159


### I want to get only those aparments in Texax (TX) and save into another variable

In [89]:
texas_df = df.loc[df["state"]=="TX"]

In [90]:
#check the structure of texas_df
texas_df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 1737 entries, 17 to 9982
Data columns (total 22 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   id             1737 non-null   int64  
 1   category       1737 non-null   object 
 2   title          1737 non-null   object 
 3   body           1737 non-null   object 
 4   amenities      1217 non-null   object 
 5   bathrooms      1730 non-null   float64
 6   bedrooms       1737 non-null   float64
 7   currency       1737 non-null   object 
 8   fee            1737 non-null   object 
 9   has_photo      1737 non-null   object 
 10  pets_allowed   890 non-null    object 
 11  price          1737 non-null   int64  
 12  price_display  1737 non-null   object 
 13  price_type     1737 non-null   object 
 14  square_feet    1737 non-null   int64  
 15  address        1360 non-null   object 
 16  cityname       1737 non-null   object 
 17  state          1737 non-null   object 
 18  latitude    

### The indexes of the rows are weird, I want to reset them

In [91]:
texas_df = texas_df.reset_index(drop=True)
texas_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1737 entries, 0 to 1736
Data columns (total 22 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   id             1737 non-null   int64  
 1   category       1737 non-null   object 
 2   title          1737 non-null   object 
 3   body           1737 non-null   object 
 4   amenities      1217 non-null   object 
 5   bathrooms      1730 non-null   float64
 6   bedrooms       1737 non-null   float64
 7   currency       1737 non-null   object 
 8   fee            1737 non-null   object 
 9   has_photo      1737 non-null   object 
 10  pets_allowed   890 non-null    object 
 11  price          1737 non-null   int64  
 12  price_display  1737 non-null   object 
 13  price_type     1737 non-null   object 
 14  square_feet    1737 non-null   int64  
 15  address        1360 non-null   object 
 16  cityname       1737 non-null   object 
 17  state          1737 non-null   object 
 18  latitude

### What is the price of the cheapest apartment in Texas?

In [92]:
price = texas_df["price"].min()
print(price)

300


# Homework

Continuing with this dataset, answer the following questions by providing the respecting code

In [94]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 22 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   id             10000 non-null  int64  
 1   category       10000 non-null  object 
 2   title          10000 non-null  object 
 3   body           10000 non-null  object 
 4   amenities      6451 non-null   object 
 5   bathrooms      9966 non-null   float64
 6   bedrooms       9993 non-null   float64
 7   currency       10000 non-null  object 
 8   fee            10000 non-null  object 
 9   has_photo      10000 non-null  object 
 10  pets_allowed   5837 non-null   object 
 11  price          10000 non-null  int64  
 12  price_display  10000 non-null  object 
 13  price_type     10000 non-null  object 
 14  square_feet    10000 non-null  int64  
 15  address        6673 non-null   object 
 16  cityname       9923 non-null   object 
 17  state          9923 non-null   object 
 18  latitud

### How many categories of apartment exist in the dataset?

### What does the dataset mean with amenities? How many different amenities and how frequent they appear in the whole dataset?

### Rename the column "state" to "state_code"

### Create a new column called "statename" where you put the names of the states based on the column "state_code"

Hint: You can ask chatgpt to create a dictionary with the state codes and states names

### Create a cross table or contingeny table with the counts of two variables "bathrooms" and "bedrooms" in order to see counts of all different combination between bathrooms and bedrooms

Hint: use pd.crosstab function

### Is there only US dollars as currency? If not, which other ones?

### Create two different dataframes: west_df and east_df

west_df is a dataframe contain only those apartments in the west coast of the US.

east_df is a dataframe contain only those apartments in the east coast of the US.

### Where is more expensive to live: in the west coast or east coast?

Hint: Calculate the means or averages of apartment prices. Don't run statistical tests

### How many aparments in Florida have a pool?

### In which states of the west coast are the top 5 biggest apartments?

hint: use df.sort() function

### Is there a prominent source in the west coast and east coast?

Hint: the column source contain the (internet) sites where the data was gathered.

### Create a new dataframe with all the cities in the border with Mexico

The new dataframe should be called mex_df

### How many bedrooms and bathrooms are in average in the cities from mex_df

### From the top 10 cheapest cities next to Mexico border, do all of them have AC as amenities?