# Combining Data

Data may not always come in 1 huge file. It is important to be able to combine then clean the data.

## Concatenation

```python
concatenated = pd.concat([df1, df2], ignore_index = True)
```

### Concatenating many files

* Leverage Python's features with data cleaning in pandas
* In order to concatenate DataFrames:
    * They must be in a list
    * Can individually load if there are a few datasets
    * But what if there are thousands?
* Solution: glob() function to find files based on a pattern

### Globbing

* Pattern matching for file names
* Wildcards: *?
    * Match any csv file: *.csv
    * Any single character: file_?.csv
* Returns a list of files names
* Can use this list to load into separate DataFrames

#### Plan

* Load files from globbing into pandas
* Add the DataFrames into a List
* Concatenate multiple Datasets at once

```python
# Find all csv files
import glob
csv_files = glob.glob('*.csv')
list_data = []
for filename in csv_files:
    data = pd.read_csv(filename)
    list_data.append(data)
pd.concat(list_data)
```

## Merging Data

* Similar to joining tables in SQL
* Combine Datasets based on common columns
![ExampleMergingData](sample_data/ExampleMergingData.PNG)

```python
pd.merge(left=state_populations, right=state_codes,
         on=None, left_on='state', right_on='name')
```

### Types of Merges

* One-to-One
* Many-to-One / One-to-Many
* Many-to-Many