# NumPy

[pandas](https://pandas.pydata.org/) is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.

Use `import` to load pandas.

In [3]:
import pandas as pd

col1 = list(range(1,6))
col2 = list(range(7, 12))
col3 = list(range(13, 18))

d = {'col1' : col1, 'col2' : col2, 'col3' : col3}

df = pd.DataFrame(d)
df.index = ['one', 'two', 'three', 'four', 'five']
print(df)
print(type(df))

       col1  col2  col3
one       1     7    13
two       2     8    14
three     3     9    15
four      4    10    16
five      5    11    17
<class 'pandas.core.frame.DataFrame'>


Read from a CSV file

In [16]:
iris = pd.read_csv("../data/iris.csv")
print(iris[0:5])

print(iris.loc[[0, 1, 2]])

print(iris.loc[range(0,3), ['Sepal.Width', 'Sepal.Length']])

print(iris.iloc[[0, 1, 2], [1, 0]])

   Sepal.Length  Sepal.Width  Petal.Length  Petal.Width Species
0           5.1          3.5           1.4          0.2  setosa
1           4.9          3.0           1.4          0.2  setosa
2           4.7          3.2           1.3          0.2  setosa
3           4.6          3.1           1.5          0.2  setosa
4           5.0          3.6           1.4          0.2  setosa
   Sepal.Length  Sepal.Width  Petal.Length  Petal.Width Species
0           5.1          3.5           1.4          0.2  setosa
1           4.9          3.0           1.4          0.2  setosa
2           4.7          3.2           1.3          0.2  setosa
   Sepal.Width  Sepal.Length
0          3.5           5.1
1          3.0           4.9
2          3.2           4.7
   Sepal.Width  Sepal.Length
0          3.5           5.1
1          3.0           4.9
2          3.2           4.7


Note that there's a difference between square brackets (pandas series) and double square brackets (pandas DataFrame) when obtaining columns. You need a pandas series to subset a pandas DataFrame.

In [14]:
print(type(iris["Sepal.Length"]))

print(type(iris[["Sepal.Length"]]))

large = iris["Sepal.Length"] > 7.5

print(iris[large])

<class 'pandas.core.series.Series'>
<class 'pandas.core.frame.DataFrame'>
     Sepal.Length  Sepal.Width  Petal.Length  Petal.Width    Species
105           7.6          3.0           6.6          2.1  virginica
117           7.7          3.8           6.7          2.2  virginica
118           7.7          2.6           6.9          2.3  virginica
122           7.7          2.8           6.7          2.0  virginica
131           7.9          3.8           6.4          2.0  virginica
135           7.7          3.0           6.1          2.3  virginica


Since pandas is built on NumPy, use NumPy logicals to subset.

In [15]:
import numpy as np

wanted = np.logical_and(iris["Sepal.Length"] > 7.5, iris["Sepal.Length"] < 7.7)
print(iris[wanted])

     Sepal.Length  Sepal.Width  Petal.Length  Petal.Width    Species
105           7.6          3.0           6.6          2.1  virginica


Using a for loop (sub-optimal)

In [21]:
for l, r in iris.iterrows():
    print(l)
    print(r)

0
Sepal.Length       5.1
Sepal.Width        3.5
Petal.Length       1.4
Petal.Width        0.2
Species         setosa
Name: 0, dtype: object
1
Sepal.Length       4.9
Sepal.Width          3
Petal.Length       1.4
Petal.Width        0.2
Species         setosa
Name: 1, dtype: object
2
Sepal.Length       4.7
Sepal.Width        3.2
Petal.Length       1.3
Petal.Width        0.2
Species         setosa
Name: 2, dtype: object
3
Sepal.Length       4.6
Sepal.Width        3.1
Petal.Length       1.5
Petal.Width        0.2
Species         setosa
Name: 3, dtype: object
4
Sepal.Length         5
Sepal.Width        3.6
Petal.Length       1.4
Petal.Width        0.2
Species         setosa
Name: 4, dtype: object
5
Sepal.Length       5.4
Sepal.Width        3.9
Petal.Length       1.7
Petal.Width        0.4
Species         setosa
Name: 5, dtype: object
6
Sepal.Length       4.6
Sepal.Width        3.4
Petal.Length       1.4
Petal.Width        0.3
Species         setosa
Name: 6, dtype: object
7
Sepal.Length      

71
Sepal.Length           6.1
Sepal.Width            2.8
Petal.Length             4
Petal.Width            1.3
Species         versicolor
Name: 71, dtype: object
72
Sepal.Length           6.3
Sepal.Width            2.5
Petal.Length           4.9
Petal.Width            1.5
Species         versicolor
Name: 72, dtype: object
73
Sepal.Length           6.1
Sepal.Width            2.8
Petal.Length           4.7
Petal.Width            1.2
Species         versicolor
Name: 73, dtype: object
74
Sepal.Length           6.4
Sepal.Width            2.9
Petal.Length           4.3
Petal.Width            1.3
Species         versicolor
Name: 74, dtype: object
75
Sepal.Length           6.6
Sepal.Width              3
Petal.Length           4.4
Petal.Width            1.4
Species         versicolor
Name: 75, dtype: object
76
Sepal.Length           6.8
Sepal.Width            2.8
Petal.Length           4.8
Petal.Width            1.4
Species         versicolor
Name: 76, dtype: object
77
Sepal.Length           6.

109
Sepal.Length          7.2
Sepal.Width           3.6
Petal.Length          6.1
Petal.Width           2.5
Species         virginica
Name: 109, dtype: object
110
Sepal.Length          6.5
Sepal.Width           3.2
Petal.Length          5.1
Petal.Width             2
Species         virginica
Name: 110, dtype: object
111
Sepal.Length          6.4
Sepal.Width           2.7
Petal.Length          5.3
Petal.Width           1.9
Species         virginica
Name: 111, dtype: object
112
Sepal.Length          6.8
Sepal.Width             3
Petal.Length          5.5
Petal.Width           2.1
Species         virginica
Name: 112, dtype: object
113
Sepal.Length          5.7
Sepal.Width           2.5
Petal.Length            5
Petal.Width             2
Species         virginica
Name: 113, dtype: object
114
Sepal.Length          5.8
Sepal.Width           2.8
Petal.Length          5.1
Petal.Width           2.4
Species         virginica
Name: 114, dtype: object
115
Sepal.Length          6.4
Sepal.Width     

Using apply

In [27]:
iris["SPECIES"] = iris["Species"].apply(str.upper)
iris.iloc[[0, 1, 2]]

Unnamed: 0,Sepal.Length,Sepal.Width,Petal.Length,Petal.Width,Species,SPECIES
0,5.1,3.5,1.4,0.2,setosa,SETOSA
1,4.9,3.0,1.4,0.2,setosa,SETOSA
2,4.7,3.2,1.3,0.2,setosa,SETOSA
