In [1]:
import pandas as pd
import numpy as np
from numpy.random import randn

In [2]:
np.random.seed(101)   
#seed is used so that the same numbers that randomly appear for me appears also for other users- random numbers doesn't change on each run

#### What are dataframes:
A data frame is a table-like data structure available in languages like R and Python. Statisticians, scientists, and programmers use them in data analysis code.
#### Why are dataframes important:
- Create a data frame from the results of an SQL query, or from a CSV file. The columns have types like string, number, and date.
- Filter a data frame down to the rows and columns of interest.
- Clean its values with arithmetic and string operations.
- Summarize groups of rows.
- Compute new columns based on existing columns.
- Join a data frame with others, for further analysis.
- Plot one column vs. another (in many different ways)
- Mathematically model one column as a function of another — e.g. with linear regression.

for more information refer to the following link: https://www.oilshell.org/blog/2018/11/30.html

In [3]:
df1 = pd.DataFrame(randn(5,4), index='A B C D E'.split(), columns='w x y z'.split())

# notice that a data frame is a table like structure of rows and columns.
# 'A B C D'.split() is another way to create a list as: ['A', 'B', 'C', 'D'] --> more efficient and faster way


In [8]:
df1

Unnamed: 0,w,x,y,z
A,2.70685,0.628133,0.907969,0.503826
B,0.651118,-0.319318,-0.848077,0.605965
C,-2.018168,0.740122,0.528813,-0.589001
D,0.188695,-0.758872,-0.933237,0.955057
E,0.190794,1.978757,2.605967,0.683509


### To get a specific row or column:
just specify the label of the desired raw or column:


In [9]:
df1['x']

A    0.628133
B   -0.319318
C    0.740122
D   -0.758872
E    1.978757
Name: x, dtype: float64

Notice that the result also includes the row name and its data-type. Result is in the form of Series: label - Data
series is made of only one column not multiple ones


Notice that we can also know the type of the variable - in our case row element from the dataframe- using 'type()' function. Example:

In [13]:
type(df1['y'])

pandas.core.series.Series

also notice we can only access data through column bar eg. (w, x, y, z) not through the indecies eg. (a, b, c, d, e) as it will generate error as in: 

In [14]:
df1['A']

KeyError: 'A'

To access more than one column use the following form:

In [16]:
df1[['x', 'w']]  # you can also relocate/ reorder columns according to your requirement

Unnamed: 0,x,w
A,0.628133,2.70685
B,-0.319318,0.651118
C,0.740122,-2.018168
D,-0.758872,0.188695
E,1.978757,0.190794


we can also add new columns to the dataframe:

In [29]:
df1['new'] = df1['x']+df1['w']
df1

Unnamed: 0,w,x,y,z,new
A,2.70685,0.628133,0.907969,0.503826,3.334983
B,0.651118,-0.319318,-0.848077,0.605965,0.3318
C,-2.018168,0.740122,0.528813,-0.589001,-1.278046
D,0.188695,-0.758872,-0.933237,0.955057,-0.570177
E,0.190794,1.978757,2.605967,0.683509,2.169552


we can delete a certain column using the drop() function but notice that: 'df1.drop(df1['new'])' would give an error because it searches for new in the indecies of the dataframe for correct accessing specify the axis to be 'axis=1' which specifies the second data-axis (X-axis / column axis) 

Notice: default of axis parameter is 0 (rows)

Example: 

In [30]:
df1.drop('new', axis=1)

Unnamed: 0,w,x,y,z
A,2.70685,0.628133,0.907969,0.503826
B,0.651118,-0.319318,-0.848077,0.605965
C,-2.018168,0.740122,0.528813,-0.589001
D,0.188695,-0.758872,-0.933237,0.955057
E,0.190794,1.978757,2.605967,0.683509


In [31]:
df1

Unnamed: 0,w,x,y,z,new
A,2.70685,0.628133,0.907969,0.503826,3.334983
B,0.651118,-0.319318,-0.848077,0.605965,0.3318
C,-2.018168,0.740122,0.528813,-0.589001,-1.278046
D,0.188695,-0.758872,-0.933237,0.955057,-0.570177
E,0.190794,1.978757,2.605967,0.683509,2.169552


another notice is that op using the previous form of drop, new is not permenantly deleted in case of reuse, if you wish to permenantly delete it activate the inplace flag which is by default = false

Example:

In [32]:
df1.drop('new', axis=1, inplace=True)
df1

Unnamed: 0,w,x,y,z
A,2.70685,0.628133,0.907969,0.503826
B,0.651118,-0.319318,-0.848077,0.605965
C,-2.018168,0.740122,0.528813,-0.589001
D,0.188695,-0.758872,-0.933237,0.955057
E,0.190794,1.978757,2.605967,0.683509


In [33]:
df1.drop('E')

Unnamed: 0,w,x,y,z
A,2.70685,0.628133,0.907969,0.503826
B,0.651118,-0.319318,-0.848077,0.605965
C,-2.018168,0.740122,0.528813,-0.589001
D,0.188695,-0.758872,-0.933237,0.955057


In [34]:
df1

Unnamed: 0,w,x,y,z
A,2.70685,0.628133,0.907969,0.503826
B,0.651118,-0.319318,-0.848077,0.605965
C,-2.018168,0.740122,0.528813,-0.589001
D,0.188695,-0.758872,-0.933237,0.955057
E,0.190794,1.978757,2.605967,0.683509


In [35]:
df1.drop('E', inplace=True)

In [36]:
df1

Unnamed: 0,w,x,y,z
A,2.70685,0.628133,0.907969,0.503826
B,0.651118,-0.319318,-0.848077,0.605965
C,-2.018168,0.740122,0.528813,-0.589001
D,0.188695,-0.758872,-0.933237,0.955057


Now, to specify a row or a number of rows to view from the dataframe, there are 2 methods:

1) direct access from row name:

In [37]:
df1.loc['A']

w    2.706850
x    0.628133
y    0.907969
z    0.503826
Name: A, dtype: float64

2) indirect access from row index:

In [39]:
df1.iloc[0]

w    2.706850
x    0.628133
y    0.907969
z    0.503826
Name: A, dtype: float64

Notice that aslo the row representation is also a series, indexed by the columns

To take a specific element or take a number of specific elements:

In [40]:
df1.loc['A','y']

0.9079694464765431

In [41]:
df1.loc[['A','B'], ['w','y']]    # ['A','B']:list of rows to include, ['w','y']:list of columns to include

Unnamed: 0,w,y
A,2.70685,0.907969
B,0.651118,-0.848077
