### Glob

If you have multiple files in a directory, it might be better to load them through glob instead of doing it manually. Let's say you have 5 files. Doing it manually would entail:

```
first_df = pandas.read_csv("file1.csv")
second_df = pandas.read_csv("file2.csv")
third_df = pandas.read_csv("file3.csv")
fourth_df = pandas.read_csv("file4.csv")
fifth_df = pandas.read_csv("file5.csv")
```

As you can see, doing that would be a little tedious. Instead, you can do it in a loop using glob. First, we will import glob and create a list with the dataframes:

In [14]:
import glob
import pandas

iris_dataframe = pandas.DataFrame()

Secondly, we will use the `glob.iglob()` function in order to take files within a directory. Note that `*` in `*.csv` means "anything with '.csv' in it." If we wanted just file`x`.csv, we can use `file*.csv`.

In [15]:
for file in glob.glob("*.csv"):
    keep_cols = ['petal_length', 'species']
    this_file_dataframe = pandas.read_csv(file,usecols = keep_cols)
    iris_dataframe = iris_dataframe.append(this_file_dataframe, ignore_index=True)

print(iris_dataframe)

     petal_length          species
0             4.7  Iris-versicolor
1             4.5  Iris-versicolor
2             4.9  Iris-versicolor
3             4.0  Iris-versicolor
4             4.6  Iris-versicolor
5             4.5  Iris-versicolor
6             4.7  Iris-versicolor
7             3.3  Iris-versicolor
8             4.6  Iris-versicolor
9             3.9  Iris-versicolor
10            3.5  Iris-versicolor
11            4.2  Iris-versicolor
12            4.0  Iris-versicolor
13            4.7  Iris-versicolor
14            3.6  Iris-versicolor
15            4.4  Iris-versicolor
16            4.5  Iris-versicolor
17            4.1  Iris-versicolor
18            4.5  Iris-versicolor
19            3.9  Iris-versicolor
20            4.8  Iris-versicolor
21            4.0  Iris-versicolor
22            4.9  Iris-versicolor
23            4.7  Iris-versicolor
24            4.3  Iris-versicolor
25            4.4  Iris-versicolor
26            4.8  Iris-versicolor
27            5.0  I

Now, your dataframes are accessible within the list. If you want to access them, you just need to call the index of the dataframe that you want. Printing a sample from the dataframes at a specific index yields you:

In [16]:
print(iris_dataframe.sample(3)) # In the python_Introduction directory, this would be iris1.csv

     petal_length          species
30            3.8  Iris-versicolor
119           1.5      Iris-setosa
47            4.3  Iris-versicolor


In [30]:
import numpy as np

df = iris_dataframe
df.loc[df['species'] == 'Iris-virginica', 'petal_length'] *= 2.54

df.wh

df.tail()


Unnamed: 0,petal_length,species
145,1.4,Iris-setosa
146,1.6,Iris-setosa
147,1.4,Iris-setosa
148,1.5,Iris-setosa
149,1.4,Iris-setosa
