### Combining rows of data
The dataset you'll be working with here relates to NYC Uber data. The original dataset has all the originating Uber pickup locations by time and latitude and longitude. For didactic purposes, you'll be working with a very small portion of the actual data.

Three DataFrames have been pre-loaded: uber1, which contains data for April 2014, uber2, which contains data for May 2014, and uber3, which contains data for June 2014. Your job in this exercise is to concatenate these DataFrames together such that the resulting DataFrame has the data for all three months.

Begin by exploring the structure of these three DataFrames in the IPython Shell using methods such as .head().

In [1]:
import pandas as pd

uber = pd.read_csv('nyc_uber_2014.csv')

In [8]:
uber1 = uber[uber['Date/Time'].str.contains("4/1/2014")]
uber2 = uber[uber['Date/Time'].str.contains("5/1/2014")]
uber3 = uber[uber['Date/Time'].str.contains("6/1/2014")]

In [9]:
# Concatenate uber1, uber2, and uber3: row_concat
row_concat = pd.concat([uber1, uber2, uber3])

# Print the shape of row_concat
print(row_concat.shape)

# Print the head of row_concat
print(row_concat.head())

# Concatenate ebola_melt and status_country column-wise: ebola_tidy
# ebola_tidy = pd.concat([ebola_melt,status_country],axis=1)


(297, 5)
   Unnamed: 0         Date/Time      Lat      Lon    Base
0           0  4/1/2014 0:11:00  40.7690 -73.9549  B02512
1           1  4/1/2014 0:17:00  40.7267 -74.0345  B02512
2           2  4/1/2014 0:21:00  40.7316 -73.9873  B02512
3           3  4/1/2014 0:28:00  40.7588 -73.9776  B02512
4           4  4/1/2014 0:33:00  40.7594 -73.9722  B02512


### Globbing
In order to concatenate DataFrames:

- They must be in a list  
- can individually load if there are a few datasets  
When there are too many files to concatenate, we can use the glob function to find files based on a pattern. Globbing is simple way for python to do pattern matching for file names. We can use various wildcards like * and ? to specify a file name pattern we are looking for.

A wildcard is a symbol that will match any arbitrary number of characters.

- *match any string. e.g. *.csv matches any csv files  
- ? only allows us to match one character e.g. file_?.csv matches file_a.csv, file_b.csv and so on.  
Then globbing will return a list of file names, which can be used to load files into separate DataFrames.

In [None]:
import glob
import pandas as pd
csv_files = glob.glob('*.csv') #returns a list of file names
list_data = []
for filename in csv_files:
	data = pd.read_csv(filename)
	list_data.append(data)
	#returns a list of dataframes
pd.concat(list_data) #concat the list of df into a single df

### 1-to-1 data merge
Merging data allows you to combine disparate datasets into a single dataset to do more complex analysis.

Here, you'll be using survey data that contains readings that William Dyer, Frank Pabodie, and Valentina Roerich took in the late 1920s and 1930s while they were on an expedition towards Antarctica. The dataset was taken from a sqlite database from the Software Carpentry SQL lesson.

Two DataFrames have been pre-loaded: site and visited. Explore them in the IPython Shell and take note of their structure and column names. Your task is to perform a 1-to-1 merge of these two DataFrames using the 'name' column of site and the 'site' column of visited

In [None]:
# Merge the DataFrames: o2o
o2o = pd.merge(left=site, right=visited, left_on='name', right_on='site')

# Print o2o
print(o2o)


### Types of merges

- one-one-one merge: There is no duplicate values in the key column
- one-to-many/many-to-one merge: duplicate values in the key column
- many-to-many: when both DataFrames do not have unique keys for a merge. What happens here is that for each duplicated key, every pairwise combination will be created.