### CREATING DATAFRAME




- df is a variable used to create a dataframe
- It is a tabular data structure i.e. it is all about rows and columns
- df contains the contents of given file
- read_csv takes path of csv file as input to import data
- We can also use a dictionary instead of using a csv file

###

1. Using df with a csv file


In [2]:
import pandas as pd

df = pd.read_csv("weather_data.csv")
df                                                     

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32,6,Rain
1,1/2/2017,35,7,Sunny
2,1/3/2017,28,2,Snow
3,1/4/2017,24,7,Snow
4,1/5/2017,32,4,Rain
5,1/6/2017,31,2,Sunny


#

2. Using df with a dictionary

- weather_data dictionary is data frame

In [3]:
import pandas as pd

weather_data = {
    
    'day' : ['1/1/2017', '1/2/2017', '1/3/2017', '1/4/2017', '1/5/2017', '1/6/2017'],
    'temperature' : [34, 44, 55, 66, 77, 26],
    'windspeed' : [6, 7, 2, 7, 4, 2],
    'event' : ['Rain', 'Sunny', 'Snow', 'Snow', 'Rain', 'Sunny']
}

df = pd.DataFrame(weather_data)
df

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,34,6,Rain
1,1/2/2017,44,7,Sunny
2,1/3/2017,55,2,Snow
3,1/4/2017,66,7,Snow
4,1/5/2017,77,4,Rain
5,1/6/2017,26,2,Sunny


#

- shape in df.shape means dimensions i.e. (total rows, total columns)

In [4]:
df.shape

(6, 4)

In [5]:
rows, columns = df.shape
print(rows,"\n")
print(columns)

6 

4


####

- head in df.head() gives initial rows
- arguements from 0 to total number of rows can be given in place of n in df.head(n) to get a specified number of rows

In [6]:
df.head(3)

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,34,6,Rain
1,1/2/2017,44,7,Sunny
2,1/3/2017,55,2,Snow


###

- df.tail() acts same as df.head, it just gives the (total - 1) last rows
- Arguements can also be given to get a specified number of last rows

In [7]:
df.tail(3)

Unnamed: 0,day,temperature,windspeed,event
3,1/4/2017,66,7,Snow
4,1/5/2017,77,4,Rain
5,1/6/2017,26,2,Sunny


### 

#

###

### USING INDEXING AND SLICING IN DATAFRAME

#

Syntax:                  df[start_index : (end_index + 1)]

In [8]:
df[2:5]

Unnamed: 0,day,temperature,windspeed,event
2,1/3/2017,55,2,Snow
3,1/4/2017,66,7,Snow
4,1/5/2017,77,4,Rain


#

- df.columns prints the number of columns

In [9]:
df.columns

Index(['day', 'temperature', 'windspeed', 'event'], dtype='object')

#

- df.column_name will print a particular column

- df.day == df['day']

In [10]:
df.day

0    1/1/2017
1    1/2/2017
2    1/3/2017
3    1/4/2017
4    1/5/2017
5    1/6/2017
Name: day, dtype: object

In [11]:
df['event']

0     Rain
1    Sunny
2     Snow
3     Snow
4     Rain
5    Sunny
Name: event, dtype: object

###

- Type of columns in our dataframe is type series

In [12]:
type(df['event'])

pandas.core.series.Series

###


- df[[column name #1, column name #2...]] will print particular columns together
- Great way of doing analysis

In [13]:
df[['event','day']]

Unnamed: 0,event,day
0,Rain,1/1/2017
1,Sunny,1/2/2017
2,Snow,1/3/2017
3,Snow,1/4/2017
4,Rain,1/5/2017
5,Sunny,1/6/2017


###

###

###

### OPERATIONS IN df

### Here's a link for rest of the pandas operations

https://pandas.pydata.org/docs/reference/api/pandas.Series.html

###

- max() :- df[column name].max() will give the max value in that column

- min() :- df[column name].min() will give the min value in that column

- mean() :- df[column name].mean() will give the average value of that column

- std() :- df[column name].std() will give the standard deviation value in that column

In [14]:
df['temperature'].max()

77

In [15]:
df['temperature'].min()

26

In [16]:
df['temperature'].mean()

50.333333333333336

###

- df.describe() will print the statastics of columns with integer data

In [17]:
df.describe()

Unnamed: 0,temperature,windspeed
count,6.0,6.0
mean,50.333333,4.666667
std,19.376962,2.33809
min,26.0,2.0
25%,36.5,2.5
50%,49.5,5.0
75%,63.25,6.75
max,77.0,7.0


###

###

###

### CONDITIONAL STATEMENTS

###

- df[df.column_name_______], where _______ is condition

In [18]:
df[df.temperature>=45]

Unnamed: 0,day,temperature,windspeed,event
2,1/3/2017,55,2,Snow
3,1/4/2017,66,7,Snow
4,1/5/2017,77,4,Rain


In [19]:
df[df.temperature == df.temperature.max()]

Unnamed: 0,day,temperature,windspeed,event
4,1/5/2017,77,4,Rain


In [20]:
df[df.temperature == df["temperature"].max()]

Unnamed: 0,day,temperature,windspeed,event
4,1/5/2017,77,4,Rain


In [21]:
df["day"][df.temperature == df["temperature"].max()]

4    1/5/2017
Name: day, dtype: object

In [22]:
df[["day","temperature"]][df.temperature == df["temperature"].max()]

Unnamed: 0,day,temperature
4,1/5/2017,77


###

###

###

### set_index() method

###

- df.index tells the index range
- df.set_index('column name') will change the specified column as index column
- df.set_index() gives a new data frame
- Inorder to make the new df as the default, we'll use inplace = True
- df.loc() will provide us with the values in a particular row
- df.reset_index will reset back the index to the original one

In [23]:
df

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,34,6,Rain
1,1/2/2017,44,7,Sunny
2,1/3/2017,55,2,Snow
3,1/4/2017,66,7,Snow
4,1/5/2017,77,4,Rain
5,1/6/2017,26,2,Sunny


In [24]:
df.index

RangeIndex(start=0, stop=6, step=1)

In [3]:
df.set_index('day', inplace=True)

In [4]:
df.set_index('temperature')

Unnamed: 0_level_0,windspeed,event
temperature,Unnamed: 1_level_1,Unnamed: 2_level_1
32,6,Rain
35,7,Sunny
28,2,Snow
24,7,Snow
32,4,Rain
31,2,Sunny


In [27]:
df

Unnamed: 0_level_0,temperature,windspeed,event
day,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1/1/2017,34,6,Rain
1/2/2017,44,7,Sunny
1/3/2017,55,2,Snow
1/4/2017,66,7,Snow
1/5/2017,77,4,Rain
1/6/2017,26,2,Sunny


In [28]:
df.loc['1/3/2017']

temperature      55
windspeed         2
event          Snow
Name: 1/3/2017, dtype: object

In [29]:
df.reset_index(inplace=True)

In [30]:
df

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,34,6,Rain
1,1/2/2017,44,7,Sunny
2,1/3/2017,55,2,Snow
3,1/4/2017,66,7,Snow
4,1/5/2017,77,4,Rain
5,1/6/2017,26,2,Sunny
