<h1> DataFrame basics</h1>
    
<br>DataFrame is a main object in Pandas. It is used to represent data with rows and columsn (tabular or excel spreadsheet-like data).<br>

<a class='anchor' id='top'></a>

<h2> Contents </h2>


* [Contents](#top)
* [Creating DataFrame](#createdf)
* [Dealing with rows and columns](#dealing)
* [Indexing and Slicing](#slice)
* [Insert new cell in current cell](#insert)
* [Finding DataFrame Type](#dtype)
* [Operations: min, max, std, describe](#operations)
* [Conditional Selection](#conditionals)
* [Set_Index](#set_index)

<b>The first thing we do is import pandas<b>:

In [5]:
import pandas as pd

<hr/>

* [<h6><ins>back to top</ins></h6>](#top)
<a class='anchor' id='createdf'></a>
<h2>Creating DataFrame</h2> 

You can create a DataFrame by importing from file or by creating with a python dictionary. 

<b> Creating a DataFrame by importing</b>:

In [4]:
df = pd.read_csv(r'C:\Users\Work\Desktop\Python Lessons\Data Science\Data Science w Py Course\Data For Use\weather.csv')
df

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32,6,Rain
1,1/2/2017,35,7,Sunny
2,1/3/2017,28,2,Snow
3,1/4/2017,24,7,Snow
4,1/5/2017,32,4,Rain
5,1/6/2017,32,2,Sunny


<b>Creating a DataFrame by Python Dictionary Creation</b>: 

In [8]:
weather_data = {
    'date': ['1/1/2017','1/2/2017','1/3/2017','1/4/2017','1/5/2017','1/6/2017'],
    'temperature': [32,35,28,24,32,31],
    'windspeed': [6,7,2,7,4,2],
    'event': ['Rain','Sunny','Snow','Snow','Rain','Sunny'],
}
df1 = pd.DataFrame(weather_data)
df1

Unnamed: 0,date,temperature,windspeed,event
0,1/1/2017,32,6,Rain
1,1/2/2017,35,7,Sunny
2,1/3/2017,28,2,Snow
3,1/4/2017,24,7,Snow
4,1/5/2017,32,4,Rain
5,1/6/2017,31,2,Sunny


<hr/>

* [<h6><ins>back to top</ins></h6>](#top)
<a class='anchor' id='dealing'></a>
<h2>Dealing with Rows and Columns</h2>

We can take a look at the <code>.shape</code> of the data and the <code>.head()</code> and/or <code>.tail()</code> to gain insights about the data's rows and columns. 

<b> Setting row and columns</b>: 

In [9]:
rows, columns = df.shape
print(f'ROWS: {rows}\nCOLS: {columns}')

ROWS: 6
COLS: 4


<b>Using <code>.head()</code> to view the first few rows of the data</b>:

In [15]:
df.head(2) #shows only first 2 rows

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32,6,Rain
1,1/2/2017,35,7,Sunny


In [17]:
df.tail(2) #shows last two rows

Unnamed: 0,day,temperature,windspeed,event
4,1/5/2017,32,4,Rain
5,1/6/2017,32,2,Sunny


<hr/>

* [<h6><ins>back to top</ins></h6>](#top)
<a class='anchor' id='slice'></a>
<h2>Slicing and Indexing</h2>  

<blockquote><b>Topics</b>: <code>[start : stop]</code> | <code>.columns</code> | <code>.COL_NAME</code> | <code>df[ [COL1,COL2,COL3] ]</code></blockquote>
                   

<b>Slicing follows the <code>[start : stop]</code> notation</b>:

In [21]:
df[2:5] #Includes row 2 -> row 4

Unnamed: 0,day,temperature,windspeed,event
2,1/3/2017,28,2,Snow
3,1/4/2017,24,7,Snow
4,1/5/2017,32,4,Rain


<b>The <code>df.columns</code> method prints all the column names</b>:

In [19]:
df.columns

Index(['day', 'temperature', 'windspeed', 'event'], dtype='object')

<b>The <code>df.COL_NAME</code> prints all the contents in the column</b>:

In [22]:
df.day #same as df['day']

0    1/1/2017
1    1/2/2017
2    1/3/2017
3    1/4/2017
4    1/5/2017
5    1/6/2017
Name: day, dtype: object

<b>Using <code>df[ [COL, COL, COL] ]</code> lets you print multiple columns</b>:

In [25]:
df[['day','event','windspeed']]

Unnamed: 0,day,event,windspeed
0,1/1/2017,Rain,6
1,1/2/2017,Sunny,7
2,1/3/2017,Snow,2
3,1/4/2017,Snow,7
4,1/5/2017,Rain,4
5,1/6/2017,Sunny,2


<hr>

* [<h6><ins>back to top</ins></h6>](#top)
<a class='anchor' id='insert'></a>
<h2>Insert New Cell in Current Cell</h2>

You already know the shortcut keys <code>'a'</code> for 'Insert Cell <b>Above</b>' and <code>'b'</code> for 'Insert Cell <b>Below</b>' when in command mode. 

<hr/>

* [<h6><ins>back to top</ins></h6>](#top)
<a class='anchor' id='dtype'></a>
<h2>Finding the DataFrame Type</h2>

In [74]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 6 entries, 1/1/2017 to 1/6/2017
Data columns (total 3 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   temperature  6 non-null      int64 
 1   windspeed    6 non-null      int64 
 2   event        6 non-null      object
dtypes: int64(2), object(1)
memory usage: 364.0+ bytes


<hr/>

* [<h6><ins>back to top</ins></h6>](#top)
<a class='anchor' id='operations'></a>
<h2>Operations: min, max, std, describe</h2>

<b><code>.std()</code> function</b>

In [30]:
df.temperature.std() #or df['temperature'].std()

3.8858718455450894

<b><code>.mean()</code>function:

In [29]:
df.temperature.mean() #or df['temperature'].mean()

30.5

<b><code>.max()</code> function</b>:

In [28]:
df.temperature.max() #or df['temperature'].max()

35

<b><code>.min()</code> function</b>:

In [31]:
df.temperature.min() #or df['temperature'].min()

24

<b><code>.describe()</code> function</b>:

In [32]:
df.describe()

Unnamed: 0,temperature,windspeed
count,6.0,6.0
mean,30.5,4.666667
std,3.885872,2.33809
min,24.0,2.0
25%,29.0,2.5
50%,32.0,5.0
75%,32.0,6.75
max,35.0,7.0


<hr>

* [<h6><ins>back to top</ins></h6>](#top)
<a class='anchor' id='conditionals'></a>
<h2>Conditional Selection</h2>

You can write dataframe conditionals much like SQL queries. For example, the following query reads:<br><br><b><i>'Return all rows in temperature that are less than or equal to 32'</i></b>:

In [33]:
df[df.temperature>=32]

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32,6,Rain
1,1/2/2017,35,7,Sunny
4,1/5/2017,32,4,Rain
5,1/6/2017,32,2,Sunny


<b>Get me a row that has the maximum temperature</b>:

In [37]:
df[df.temperature==df.temperature.max()] 
#OR...
#df[df.temperature==df['temperature'].max()] 
#OR...
#df[df['temperature']==df['temperature'].max()]

Unnamed: 0,day,temperature,windspeed,event
1,1/2/2017,35,7,Sunny


<b>Get me the day that had the maximum temperature</b>:

In [49]:
df.day[df.temperature==df.temperature.max()]
#OR...
#df['day'][df.temperature==df.temperature.max()]
#OR...
#df['day'][df['temperature']==df.temperature.max()]
#OR...
#df['day'][df['temperature']==df['temperature'].max()]

1    1/2/2017
Name: day, dtype: object

<b>Give me the day AND the temperature together</b>:

In [50]:
df[['day','temperature']][df.temperature==df.temperature.max()]

Unnamed: 0,day,temperature
1,1/2/2017,35


<hr/>

* [<h6><ins>back to top</ins></h6>](#top)
<a class='anchor' id='set_index'></a>
<h2><code>Set_Index</code></h2>

All DataFrames will have an index assigned to them automatically. You can show the index output with <code>df.index</code>.

You can change the index to something correlative to the index that helps to identify it with <code>df.set_index(COL)</code>

This allows you to search for data using the <code>.loc</code> method:

In [54]:
df

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32,6,Rain
1,1/2/2017,35,7,Sunny
2,1/3/2017,28,2,Snow
3,1/4/2017,24,7,Snow
4,1/5/2017,32,4,Rain
5,1/6/2017,32,2,Sunny


In [62]:
df.set_index('day', inplace=True)

In [63]:
df

Unnamed: 0_level_0,temperature,windspeed,event
day,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1/1/2017,32,6,Rain
1/2/2017,35,7,Sunny
1/3/2017,28,2,Snow
1/4/2017,24,7,Snow
1/5/2017,32,4,Rain
1/6/2017,32,2,Sunny


<b>Searching for values with the new index & <code>.loc[NAME]</code></b>:

In [64]:
df.loc['1/6/2017']

temperature       32
windspeed          2
event          Sunny
Name: 1/6/2017, dtype: object

<b>To reset back to default index, use <code>.reset_index(inplace=True)</code>

In [59]:
df.reset_index(inplace=True)
df

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32,6,Rain
1,1/2/2017,35,7,Sunny
2,1/3/2017,28,2,Snow
3,1/4/2017,24,7,Snow
4,1/5/2017,32,4,Rain
5,1/6/2017,32,2,Sunny
