# Accessing Elements in Pandas DataFrame

Pandas DataFrames are two-dimensional data structures with labeled rows and columns, that can hold many data types. This is a second type of datastructure managed by Pandas and it's perfect for mor complicated representations such as a set of features for several elements.

In [1]:
import pandas as pd
import numpy as np

The esiest way to create a DataFrame is from an already defined dictionary composed by keys and pandas Series

In [2]:
data = {
    'store1': pd.Series(data=np.random.randint(0,100,size=3), index=['mango','papaya','kiwi']),
    'store2': pd.Series(data=np.random.randint(0,100,size=4), index=['mango','papaya', 'avocado', 'kiwi']),
    'store3': pd.Series(data=np.random.randint(0,100,size=3), index=['mango','papaya','kiwi'])
}
type(data)

dict

In [3]:
df = pd.DataFrame(data)

In [4]:
df

Unnamed: 0,store1,store2,store3
avocado,,73,
kiwi,59.0,82,64.0
mango,78.0,74,37.0
papaya,5.0,97,0.0


Few things to notice:
- the row labels of the DataFrame are built from the union of the index labels of the two Pandas Series we used to construct the dictionary;
- the column labels of the DataFrame are taken from the keys of the dictionary;
- `NaN` stands for **Not a Number**, and is Pandas way of indicating that it doesn't have a value for that particular row and column index. This value is automatically added by Pandas when the information for a specific row-column index is missing

If the indices of the series aren't provided, then the numerical indices are used

In [5]:
data_no_index = {
    'store1': pd.Series(data=np.random.randint(0,100,size=3)),
    'store2': pd.Series(data=np.random.randint(0,100,size=4)),
    'store3': pd.Series(data=np.random.randint(0,100,size=3))
}
df_no_index = pd.DataFrame(data_no_index)

In [6]:
df_no_index

Unnamed: 0,store1,store2,store3
0,90.0,1,73.0
1,11.0,2,36.0
2,40.0,28,73.0
3,,99,


It's possible to create a dataframe from a dictionary of lists (arrays). 

In [7]:
data = {'Integers' : [1,2,3],
        'Floats' : [4.5, 8.2, 9.6]}

# We create a DataFrame 
df = pd.DataFrame(data)

# We display the DataFrame
df

Unnamed: 0,Integers,Floats
0,1,4.5
1,2,8.2
2,3,9.6


Finally, the last way to create a Pandas DataFrame is to use a list of Python dict

In [8]:
data_ex = [{'a': 20, 'b': 30, 'c': 35}, 
          {'a': 10, 'b': 50, 'c': 15, 'd':5}]

# We create a DataFrame 
store_ex = pd.DataFrame(data_ex, index=['label1','label2'])

# We display the DataFrame
store_ex

Unnamed: 0,a,b,c,d
label1,20,30,35,
label2,10,50,15,5.0


Just like with Pandas Series we can also extract information from DataFrames using attributes.

This prints the head of the dataframe and it's very useful to get an idea about the df information stored in it

In [9]:
df.head()

Unnamed: 0,Integers,Floats
0,1,4.5
1,2,8.2
2,3,9.6


with large df we can also try to read the **tail** of the df to see the last samples for some reason

In [10]:
df.tail()

Unnamed: 0,Integers,Floats
0,1,4.5
1,2,8.2
2,3,9.6


Moreover, it's possibile to:
- get the df's column index

In [11]:
df.columns

Index(['Integers', 'Floats'], dtype='object')

- get only the df's values

In [12]:
df.values

array([[1. , 4.5],
       [2. , 8.2],
       [3. , 9.6]])

- get the df's row index 

In [13]:
df.index

RangeIndex(start=0, stop=3, step=1)

- get information about the shape, dimension and total number of elements in the dataframe

In [14]:
df.shape

(3, 2)

In [15]:
df.ndim

2

In [16]:
df.size

6