Creating, Reading and Writing

In [1]:
import pandas as pd

pd.DataFrame({'Yes': [50, 21], 'No': [131, 2]})

Unnamed: 0,Yes,No
0,50,131
1,21,2


We can assign values to it by using an index parameter in our constructor:

In [2]:
pd.DataFrame({'Bob': ['I liked it.', 'It was awful.'], 
              'Sue': ['Pretty good.', 'Bland.']},
             index=['Product A', 'Product B'])

Unnamed: 0,Bob,Sue
Product A,I liked it.,Pretty good.
Product B,It was awful.,Bland.


Series

In [3]:
pd.Series([1, 2, 3, 4, 5])

0    1
1    2
2    3
3    4
4    5
dtype: int64

Reading Data files

In [4]:
file_test = pd.read_csv("Reading_data_files.csv")

file_test.shape

(6, 3)

In [5]:
file_test.head()

Unnamed: 0,Product_A,Product_B,Product_C
0,30,21,9
1,35,34,1
2,41,11,11
3,35,27,3
4,25,14,2


Indexing, Selecting & Assigning

In [6]:
import pandas as pd
reviews = pd.read_csv("Reading_data_files.csv")
pd.set_option('display.max_rows', 5)

Native accessors

In [7]:
reviews

Unnamed: 0,Product_A,Product_B,Product_C
0,30,21,9
1,35,34,1
...,...,...,...
4,25,14,2
5,11,31,18


In [8]:
reviews.Product_A

0    30
1    35
     ..
4    25
5    11
Name: Product_A, Length: 6, dtype: int64

In [9]:
reviews['Product_A']

0    30
1    35
     ..
4    25
5    11
Name: Product_A, Length: 6, dtype: int64

In [10]:
reviews['Product_A'][0]

30

In [11]:
reviews.iloc[0]

Product_A    30
Product_B    21
Product_C     9
Name: 0, dtype: int64

In [12]:
reviews.iloc[:, 0]

0    30
1    35
     ..
4    25
5    11
Name: Product_A, Length: 6, dtype: int64

In [13]:
reviews.iloc[:3, 0]

0    30
1    35
2    41
Name: Product_A, dtype: int64

In [14]:
reviews.iloc[1:3, 0]

1    35
2    41
Name: Product_A, dtype: int64

In [15]:
reviews.iloc[[0, 1, 2], 0]

0    30
1    35
2    41
Name: Product_A, dtype: int64

In [16]:
reviews.iloc[-2:]

Unnamed: 0,Product_A,Product_B,Product_C
4,25,14,2
5,11,31,18


In [17]:
reviews.loc[0, 'Product_A']

30

In [18]:
reviews.loc[:, ['Product_A', 'Product_B']]

Unnamed: 0,Product_A,Product_B
0,30,21
1,35,34
...,...,...
4,25,14
5,11,31


In [19]:
cols = ['Product_A', 'Product_B']
indices = [0, 2]
reviews.loc[indices, cols]

Unnamed: 0,Product_A,Product_B
0,30,21
2,41,11


In [20]:
cols = ['Product_A', 'Product_C']
reviews.loc[:2, cols]

Unnamed: 0,Product_A,Product_C
0,30,9
1,35,1
2,41,11


Choosing between loc and iloc
When choosing or transitioning between loc and iloc, there is one "gotcha" worth keeping in mind, which is that the two methods use slightly different indexing schemes.

iloc uses the Python stdlib indexing scheme, where the first element of the range is included and the last one excluded. So 0:10 will select entries 0,...,9. loc, meanwhile, indexes inclusively. So 0:10 will select entries 0,...,10.

Manipulating the index

In [21]:
reviews.set_index("Product_C")

Unnamed: 0_level_0,Product_A,Product_B
Product_C,Unnamed: 1_level_1,Unnamed: 2_level_1
9,30,21
1,35,34
...,...,...
2,25,14
18,11,31


Conditional selection

In [22]:
reviews.Product_A == 30

0     True
1    False
     ...  
4    False
5    False
Name: Product_A, Length: 6, dtype: bool

In [23]:
reviews.loc[reviews.Product_A == 30]

Unnamed: 0,Product_A,Product_B,Product_C
0,30,21,9


In [24]:
reviews.loc[(reviews.Product_A == 30) & (reviews.Product_B >= 21)]

Unnamed: 0,Product_A,Product_B,Product_C
0,30,21,9


In [25]:
reviews.loc[reviews.Product_A.isin([30, 41])]

Unnamed: 0,Product_A,Product_B,Product_C
0,30,21,9
2,41,11,11


In [26]:
reviews.loc[reviews.Product_A.notnull()]

Unnamed: 0,Product_A,Product_B,Product_C
0,30,21,9
1,35,34,1
...,...,...,...
4,25,14,2
5,11,31,18


Assigning data

In [27]:
reviews['critic'] = 'everyone'
reviews['critic']

0    everyone
1    everyone
       ...   
4    everyone
5    everyone
Name: critic, Length: 6, dtype: object

In [28]:
reviews['index_backwards'] = range(len(reviews), 0, -1)
reviews['index_backwards']

0    6
1    5
    ..
4    2
5    1
Name: index_backwards, Length: 6, dtype: int64