# Accessing Elements in Pandas DataFrames

As for the Pandas Series, DataFrame's elements can be accessed by using:
- brackets `[][]` 
- loc

Let's see some example

In [53]:
import pandas as pd
import numpy as np

In [54]:
data = {
    'store1': pd.Series(data=np.random.randint(0,100,size=3), index=['mango','papaya','kiwi']),
    'store2': pd.Series(data=np.random.randint(0,100,size=4), index=['mango','papaya', 'avocado', 'kiwi']),
    'store3': pd.Series(data=np.random.randint(0,100,size=3), index=['mango','papaya','kiwi'])
}

df = pd.DataFrame(data)
df

Unnamed: 0,store1,store2,store3
avocado,,16,
kiwi,71.0,92,26.0
mango,83.0,70,99.0
papaya,75.0,21,27.0


when accessing individual elements in a DataFrame the labels should always be provided with the column label first, i.e. in the form `dataframe[column][row]`

In [55]:
df['store1']['kiwi']

71.0

accessing to a subsection of the dataframe

In [56]:
df[['store1']]

Unnamed: 0,store1
avocado,
kiwi,71.0
mango,83.0
papaya,75.0


In [57]:
type(df[['store1']])

pandas.core.frame.DataFrame

In [58]:
# accessing to a subsection of the dataframe
df[['store1', 'store3']]

Unnamed: 0,store1,store3
avocado,,
kiwi,71.0,26.0
mango,83.0,99.0
papaya,75.0,27.0


In [59]:
# accessing to a give Series
df['store1']

avocado     NaN
kiwi       71.0
mango      83.0
papaya     75.0
Name: store1, dtype: float64

In [60]:
type(df['store1'])

pandas.core.series.Series

using loc (only on row index, otherwise an error is raised)

In [61]:
subdf = df.loc[['avocado']]
subdf

Unnamed: 0,store1,store2,store3
avocado,,16,


In [62]:
type(subdf)

pandas.core.frame.DataFrame

If we want to add another row element to the DataFrame, we can use the the `append` method that allows to append a DataFrame to another

In [63]:
new_df = pd.DataFrame([{'store1': 20, 'store2':3, 'store3':0}], index=['orange'])
df.append(new_df)

Unnamed: 0,store1,store2,store3
avocado,,16,
kiwi,71.0,92,26.0
mango,83.0,70,99.0
papaya,75.0,21,27.0
orange,20.0,3,0.0


Converaly, if we want to add a new column 

In [64]:
df['store4'] = np.random.randint(1,30,size=4)

In [65]:
df

Unnamed: 0,store1,store2,store3,store4
avocado,,16,,3
kiwi,71.0,92,26.0,14
mango,83.0,70,99.0,3
papaya,75.0,21,27.0,8


Sometimes the given reppresentation doesn't fit our way to work with data and we have to reverse column with row. 

To do so, we must use the transpose method T

In [69]:
df = df.T
df

Unnamed: 0,avocado,kiwi,mango,papaya
store1,,71.0,83.0,75.0
store2,16.0,92.0,70.0,21.0
store3,,26.0,99.0,27.0
store4,3.0,14.0,3.0,8.0


To add a new column using data from particular rows

In [75]:
df['ogm_papaya'] = df['papaya'][1:] * 2  # arithmetic operation can be performed on df too
df

Unnamed: 0,avocado,kiwi,mango,papaya,ogm_papaya
store1,,71.0,83.0,75.0,
store2,16.0,92.0,70.0,21.0,42.0
store3,,26.0,99.0,27.0,54.0
store4,3.0,14.0,3.0,8.0,16.0


Another approach to insert data in the DataFrame is to use the `insert(loc,label,data)` method which allows us to insert a new column in the dataframe at location `loc`, with the given column `label`, and given `data`.

If the label index already exists, the insert method will raise an error about the duplicate index and the impossibility to add the element to the dataframe. Conversaly, the previous approaches will update the stored information

In [84]:
df.insert(3, 'nuts', np.random.randint(1,30, 4))
df

Unnamed: 0,avocado,kiwi,mango,nuts,papaya,ogm_papaya
store1,,71.0,83.0,20,75.0,
store2,16.0,92.0,70.0,15,21.0,42.0
store3,,26.0,99.0,3,27.0,54.0
store4,3.0,14.0,3.0,4,8.0,16.0


Sometimes we want to clean our DataFrame from useless information. To do so, we can use the `pop(index_label)` or `drop([index1, index2], axis=0|1)`

While pop directly modifies the dataframe, drop requires to save it's output in a variable.

In [88]:
df.pop('ogm_papaya')

store1     NaN
store2    42.0
store3    54.0
store4    16.0
Name: ogm_papaya, dtype: float64

In [89]:
df

Unnamed: 0,avocado,kiwi,mango,nuts,papaya
store1,,71.0,83.0,20,75.0
store2,16.0,92.0,70.0,15,21.0
store3,,26.0,99.0,3,27.0
store4,3.0,14.0,3.0,4,8.0


In [98]:
# deleting columns 
df.drop(['nuts', 'avocado'], axis=1)

Unnamed: 0,kiwi,mango,papaya
store1,71.0,83.0,75.0
store2,92.0,70.0,21.0
store3,26.0,99.0,27.0
store4,14.0,3.0,8.0


In [99]:
# the dataframe isn't affected by the drop operation
df

Unnamed: 0,avocado,kiwi,mango,nuts,papaya
store1,,71.0,83.0,20,75.0
store2,16.0,92.0,70.0,15,21.0
store3,,26.0,99.0,3,27.0
store4,3.0,14.0,3.0,4,8.0


In [100]:
df.drop(['store4'], axis=0)

Unnamed: 0,avocado,kiwi,mango,nuts,papaya
store1,,71.0,83.0,20,75.0
store2,16.0,92.0,70.0,15,21.0
store3,,26.0,99.0,3,27.0


Sometimes happens that the imported dataframe has malformed labels. To rename the index label we can use the `rename` function.

like the `drop` method, the output of `rename` function must be saved in a variable to avoid loosing data 

In [102]:
renamed_df = df.rename(index={'store3': 'Antonio\'s', 'store2':'Beerhouse'})
renamed_df

Unnamed: 0,avocado,kiwi,mango,nuts,papaya
store1,,71.0,83.0,20,75.0
Beerhouse,16.0,92.0,70.0,15,21.0
Antonio's,,26.0,99.0,3,27.0
store4,3.0,14.0,3.0,4,8.0


In [103]:
renamed_df = renamed_df.rename(columns={'papaya':'beers', 'mango':'meatball'})
renamed_df

Unnamed: 0,avocado,kiwi,meatball,nuts,beers
store1,,71.0,83.0,20,75.0
Beerhouse,16.0,92.0,70.0,15,21.0
Antonio's,,26.0,99.0,3,27.0
store4,3.0,14.0,3.0,4,8.0


we can also change the index to be one of the columns in the DataFrame

In [105]:
renamed_df.set_index('meatball')

Unnamed: 0_level_0,avocado,kiwi,nuts,beers
meatball,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
83.0,,71.0,20,75.0
70.0,16.0,92.0,15,21.0
99.0,,26.0,3,27.0
3.0,3.0,14.0,4,8.0
