<a href="https://colab.research.google.com/github/MonkeyWrenchGang/MGTPython/blob/main/module_2/2_subsetting_columns.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Selecting or Subsetting Columns 

Pandas DataFrames is a two-dimensional table of data with rows and columns. One way to access specific columns in a DataFrame is through subsetting using single brackets `[]` or with double brackets `[[]]`. You can do this by  calling the DataFrame and passing in the name of the column(s) you want to subset into single or double square brackets. 

For example, if we have a DataFrame called "df" and we want to subset the column "Name", we would use the code: df["Name"]. This will return a pandas Series object containing only the "Name" column. However if you use the double brackets `[[]]` to subset it will return a DataFrame. You can also subset multiple columns by passing in a list of column names: df[["Name", "Age"]].


---
## Adding to the series vs dataframe confusion 

A pandas **DataFrame** is a two-dimensional table of data with rows and columns, while a pandas **Series** is a one-dimensional array-like object that can hold any data type.

In a DataFrame you can have different data types columns, and you can perform various operations on the rows and columns, like sorting, filtering, aggregation, etc.

A Series, on the other hand, has only one dimension of data, and it only has an index (not column labels). The elements in a Series are all of the same data type and all operations performed on a Series are done on the entire Series.



---

In this tutorial we'll look at subseting a panda's DataFrame's columns and how we can append columns together with the `concat()` function. The `concat()` function in pandas is used to concatenate multiple pandas objects (such as DataFrames or Series) along a specific axis (either rows or columns). Here are our steps

1. import a dataset 
2. select a single column of data
3. select multiple columns of data 
4. concatenate two series objects into a dataframe 
5. concatenate a series and dataframe object into a new datafrme 




# Import Libraries 

Like always we start by importing the necessary libraries and configuring our environment. 

In [1]:
# -- notebook options -- 
from IPython.core.display import display, HTML
from IPython.display import clear_output
display(HTML("<style>.container { width:90% }</style>"))
import warnings
warnings.filterwarnings('ignore')
# ------------------------------------------------------------------

# -- key libraries --
import pandas as pd


# 1. Lets import a CSV File

Download the CSV file from Canvas to a location on your computer or on google drive. As a helper i've also included link to github hosted files.

Broward County AirBnB listings from Github:

- https://raw.githubusercontent.com/MonkeyWrenchGang/MGTPython/main/module_2/data/broward_listings.csv

```python
# import data 
listings = pd.read_csv("https://raw.githubusercontent.com/MonkeyWrenchGang/MGTPython/main/module_2/data/broward_listings.csv")
listings.head()
```

In [2]:
listings = pd.read_csv("https://raw.githubusercontent.com/MonkeyWrenchGang/MGTPython/main/module_2/data/broward_listings.csv")
listings.head()

Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365,number_of_reviews_ltm,license
0,43366490,Miami Huge Heated Private Pool (winter),280056,Monique,,West Park,25.96367,-80.18432,Entire home/apt,307,3,33,2022-09-18,1.16,2,57,12,2208974900.0
1,25564658,Miami Private HOME to Explore South Florida,43677351,Matt,,West Park,25.95616,-80.17799,Entire home/apt,374,3,31,2022-09-05,0.6,2,361,13,2021066786.0
2,621066054586333393,Better Choice for your Vacation! Two Convenien...,25138314,RoomPicks By Antony,,Hallandale Beach,25.95554,-80.12043,Private room,268,2,0,,,13,86,0,
3,41829907,Listing that fits all of your vacation needs.,330607161,Anselm,,Miramar,25.95453,-80.23764,Entire home/apt,104,1,56,2022-08-31,1.75,1,122,25,
4,50178608,"2 Gorgeous Units! Pool, Restaurant, & Bar Onsite",25138314,RoomPicks By Antony,,Hallandale Beach,25.95506,-80.1206,Private room,294,2,0,,,13,86,0,


# 2. Create a Series Object 

Our listings table (DataFrame) contains AirBnB listing data in Broward County Florida. Suppose you are asked to create a series object called "abnb_prices"  containing the "price" field, how would you do this ? 

```python
# create price series 
abnb_prices = listings["price"]
abnb_prices.head(10)
```

here we use a single set of brackets `[ ]` to subset our column the result will be a series object with a single column give it a try: 


In [8]:
# create price series 
abnb_prices = listings["price"]
abnb_prices.head(10)

0    307
1    374
2    268
3    104
4    294
5    386
6    549
7    180
8    425
9    255
Name: price, dtype: int64

you'll note that it does not contain a column header! but does contain an index. suppose you wanted to snag the 5th price. You can do this by simply referencing the 5th position or 4th index position (remember python is indexed at 0). try it 

```python 
abnb_prices[4]
```

In [6]:
abnb_prices[4]

294

## 2.1 Suppose you used double bracket instead? 

if you use a double bracket `[[ ]]` it will return a DataFrame instead lets see what that looks like. 

```python
# create price table 
abnb_prices_table = listings[["price"]]
abnb_prices_table.head(10)
```

In [9]:
# create price table 
abnb_prices_table = listings[["price"]]
abnb_prices_table.head(10)

Unnamed: 0,price
0,307
1,374
2,268
3,104
4,294
5,386
6,549
7,180
8,425
9,255


See the difference you now have a table with a single column see what happens if you try to reference the 5th price like this abnb_prices_table[4] vs abnb_prices_table.loc[4]. One privides an error while the other doesn't. key difference is that a dataframe expects a row and column index selection while a series is treated like an array or a list. 

```python
abnb_prices_table[4]
abnb_prices_table.loc[4]
```

price    294
Name: 4, dtype: int64

# 3. Select Multiple Columns

Using the double brackets [[]] to subset a dataframe  allows us to select multiple columns like this: df[["Name", "Age"]].

suppose you were asked to create a dataframe from listings only containing name, room_type and price. use the double brackets to create a new dataset "property_prices" like this: 

```python
property_prices = listings[["name","room_type","price"]]
property_prices.head()
```

In [15]:
property_prices = listings[["name","room_type","price"]]
property_prices.head()

Unnamed: 0,name,room_type,price
0,Miami Huge Heated Private Pool (winter),Entire home/apt,307
1,Miami Private HOME to Explore South Florida,Entire home/apt,374
2,Better Choice for your Vacation! Two Convenien...,Private room,268
3,Listing that fits all of your vacation needs.,Entire home/apt,104
4,"2 Gorgeous Units! Pool, Restaurant, & Bar Onsite",Private room,294


# 4. Concatenate 

The concat() function in pandas is used to concatenate multiple pandas objects (such as DataFrames or Series) along a specific axis (either rows or columns).

the basic syntax:
```python
pd.concat(objs, axis=0, join='outer', ignore_index=False)
```
- objs: a list or a collection of pandas objects that you want to concatenate.
- axis: the axis along which the concatenation should be performed. The default value is 0, which concatenates the objects along the rows (i.e. stacks them vertically). If you set the axis to 1, the objects will be concatenated along the columns (i.e. join them horizontally).
- join: the type of join to be performed, default is 'outer'. It can be one of 'outer', 'inner', 'left' or 'right'. **More on joins later**
- ignore_index: whether to ignore the index of the original objects and create a new one for the concatenated object.



---
Suppose you are given two series objects property_names and property_prices, and you need to concatenate them together into property_names_anb_prices here's how you could do this. 

```python
# create series objects 
property_names = listings["name"]
property_prices = listings["price"]

# append them together 
property_names_and_prices = pd.concat([property_names,property_prices],axis=1)

# eyeball the result
property_names_and_prices.head()


```



In [19]:
# create series objects 
property_names = listings["name"]
property_prices = listings["price"]

# append them together 
property_names_and_prices = pd.concat([property_names,property_prices],axis=1)

property_names_and_prices.head()



Unnamed: 0,name,price
0,Miami Huge Heated Private Pool (winter),307
1,Miami Private HOME to Explore South Florida,374
2,Better Choice for your Vacation! Two Convenien...,268
3,Listing that fits all of your vacation needs.,104
4,"2 Gorgeous Units! Pool, Restaurant, & Bar Onsite",294


Simple right? change the above to use axis=0 instead. What happens? 

# 5. concatenate a series and dataframe object into a new datafrme 

Suppose we have our property_prices dataframe and we want to concatenate a series that contains latitude and another series that contains longitude can we do that too? for sure! 

```python
# create series objects 
property_prices = listings[["name","room_type","price"]]
property_lat = listings["latitude"]
property_lon = listings["longitude"]

# append them together 
property_prices_lat_lon = pd.concat([property_prices,property_lat, property_lon],axis=1)

property_prices_lat_lon.head()
``` 

In [20]:
# create series objects 
property_prices = listings[["name","room_type","price"]]
property_lat = listings["latitude"]
property_lon = listings["longitude"]

# append them together 
property_prices_lat_lon = pd.concat([property_prices,property_lat, property_lon],axis=1)

property_prices_lat_lon.head()

Unnamed: 0,name,room_type,price,latitude,longitude
0,Miami Huge Heated Private Pool (winter),Entire home/apt,307,25.96367,-80.18432
1,Miami Private HOME to Explore South Florida,Entire home/apt,374,25.95616,-80.17799
2,Better Choice for your Vacation! Two Convenien...,Private room,268,25.95554,-80.12043
3,Listing that fits all of your vacation needs.,Entire home/apt,104,25.95453,-80.23764
4,"2 Gorgeous Units! Pool, Restaurant, & Bar Onsite",Private room,294,25.95506,-80.1206


# Conclusion 

So here we have looked at how to subset columns into series objects and multiple columns into DataFrames. 

- single brackets `[]` return a Series object 
- single brackets are used to subset a single column 
- double brackets `[[]]` return a DataFrame
- double brackets are used to subset multiple columns 