# Pandas
<li>Pandas is an open-source Python package that is built on top of NumPy used for working with data sets.</li> 
<li>The name "Pandas" has a reference to <b>"Python Data Analysis".</b></li>
<li>Pandas is considered to be one of the best data-wrangling packages.</li>
<li>Pandas offers user-friendly, easy-to-use data structures and analysis tools for analyzing, cleaning, exploring and manipulating data.</li>
<li>It also functions well with various other data science Python modules.</li>


## Why Use Pandas?

<li>Pandas is known for its exceptional ability to represent and organize data.</li>
<li>The Pandas library was created to be able to work with large datasets faster and more efficiently than any other library.</li>
<li>It excels at analyzing huge amounts of data.Pandas allows us to analyze big data and make conclusions based on statistical theories.</li>
<li>Pandas can clean messy data sets, and make them readable and relevant.</li>
<li>By combining the functionality of Matplotlib and NumPy, Pandas offers users a powerful tool for performing <b>data analytics and visualization.</b></li>
<li>Data can be imported to Pandas from a variety of file formats, such as Csv, SQL, Excel, and JSON, among others.</li>
<li>Pandas is a versatile and marketable skill set for data analysts and data scientists that can gain the attention of employers.</li>


## Installation Of Pandas
<li>Go to your terminal, open and activate your virtual environment and then use the following commands for installing pandas.</li>

<code>
    pip install pandas
</code>

## Importing Pandas
<li>We need to import pandas if we want to create a pandas dataframe and perform any analysis on them.</li>
<li>We can import pandas package using the following command:</li>
<code>
    import pandas as pd
</code>

In [2]:
import pandas as pd

## How To Create A Pandas DataFrame
<li>A Pandas DataFrame is a 2 dimensional data structure, like a 2 dimensional array, arranged in a table like structure with rows and columns.</li>
<li>We can create a basic pandas dataframe by various methods.</li>
<li>Let's discuss some of the methods to create the given dataframes:</li>

### 1. From Python Dictionary

In [3]:
data = {
    "name": ["naresh","ram"],
    "age": [24,25],
    "address": ["bkt","ktm"]
}

In [4]:
df = pd.DataFrame(data)
df

Unnamed: 0,name,age,address
0,naresh,24,bkt
1,ram,25,ktm


### 2. From a list of dictionaries

In [5]:
list_dic = [
    {
        "name":"Hari",
        "age": 21
        
    },
    {
      "name":"amisha",
        "age": 20
    }
]

In [6]:
list_df = pd.DataFrame(list_dic)
list_df

Unnamed: 0,name,age
0,Hari,21
1,amisha,20


### 3. From a list of tuples

In [7]:
list_tupe=[
    ("naresh", 23,"bkt"),
    ("megamind",34,"ktm")
]

In [8]:
tup_dic = pd.DataFrame(list_tupe)
tup_dic

Unnamed: 0,0,1,2
0,naresh,23,bkt
1,megamind,34,ktm


### 4. From list of lists

In [9]:
nested_list = [[
    "naresh",23,"ktm"
],
              ['megamind',24,"bkt"]]

In [10]:
nested_dic = pd.DataFrame(nested_list)
nested_dic

Unnamed: 0,0,1,2
0,naresh,23,ktm
1,megamind,24,bkt


# Question
1. Read 'imports-85.data' file using file reader.
2. Store the data present inside the file into a list of list.
3. Create a pandas dataframe using list of lists.
4. For column name, we can use the columns variable given below.

In [11]:
import csv
with open("data/imports-85.data") as file:
    reader = csv.reader(file)
    data_list = list(reader)


In [12]:
_data_df = pd.DataFrame(data_list)
_data_df.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,16,17,18,19,20,21,22,23,24,25
0,3,?,alfa-romero,gas,std,two,convertible,rwd,front,88.6,...,130,mpfi,3.47,2.68,9.0,111,5000,21,27,13495
1,3,?,alfa-romero,gas,std,two,convertible,rwd,front,88.6,...,130,mpfi,3.47,2.68,9.0,111,5000,21,27,16500
2,1,?,alfa-romero,gas,std,two,hatchback,rwd,front,94.5,...,152,mpfi,2.68,3.47,9.0,154,5000,19,26,16500
3,2,164,audi,gas,std,four,sedan,fwd,front,99.8,...,109,mpfi,3.19,3.4,10.0,102,5500,24,30,13950
4,2,164,audi,gas,std,four,sedan,4wd,front,99.4,...,136,mpfi,3.19,3.4,8.0,115,5500,18,22,17450


In [13]:
columns = ['symboling', 'normalized_losses', 'make', 'fuel_type', 'aspiration', 'num_of_doors',
          'body_style', 'drive_wheels', 'engine_location', 'wheel_base', 'length', 'width', 
           'height', 'curb_weight', 'engine_type', 'num_of_cylinders', 'engine_size', 'fuel_system',
          'bore', 'stroke', 'compression', 'horsepower', 'peak_rpm', 'city_mpg', 'highway_mpg', 
           'price']

In [14]:
_data_df.columns = columns
_data_df

Unnamed: 0,symboling,normalized_losses,make,fuel_type,aspiration,num_of_doors,body_style,drive_wheels,engine_location,wheel_base,...,engine_size,fuel_system,bore,stroke,compression,horsepower,peak_rpm,city_mpg,highway_mpg,price
0,3,?,alfa-romero,gas,std,two,convertible,rwd,front,88.60,...,130,mpfi,3.47,2.68,9.00,111,5000,21,27,13495
1,3,?,alfa-romero,gas,std,two,convertible,rwd,front,88.60,...,130,mpfi,3.47,2.68,9.00,111,5000,21,27,16500
2,1,?,alfa-romero,gas,std,two,hatchback,rwd,front,94.50,...,152,mpfi,2.68,3.47,9.00,154,5000,19,26,16500
3,2,164,audi,gas,std,four,sedan,fwd,front,99.80,...,109,mpfi,3.19,3.40,10.00,102,5500,24,30,13950
4,2,164,audi,gas,std,four,sedan,4wd,front,99.40,...,136,mpfi,3.19,3.40,8.00,115,5500,18,22,17450
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
200,-1,95,volvo,gas,std,four,sedan,rwd,front,109.10,...,141,mpfi,3.78,3.15,9.50,114,5400,23,28,16845
201,-1,95,volvo,gas,turbo,four,sedan,rwd,front,109.10,...,141,mpfi,3.78,3.15,8.70,160,5300,19,25,19045
202,-1,95,volvo,gas,std,four,sedan,rwd,front,109.10,...,173,mpfi,3.58,2.87,8.80,134,5500,18,23,21485
203,-1,95,volvo,diesel,turbo,four,sedan,rwd,front,109.10,...,145,idi,3.01,3.40,23.00,106,4800,26,27,22470


### 5. Pandas Dataframe From Csv files

<li>We can load a csv file and create a dataframe out of the data present inside a csv file using pandas.</li>
<li>We have <b>.read_csv()</b> method to read a csv file and create a pandas dataframe from the dataset.</li>

In [15]:
weather_df = pd.read_csv("data/weather_data.csv", names =["day","temperature","windspeed","event"])
weather_df.head()

Unnamed: 0,day,temperature,windspeed,event
0,kfjkdfjskd,,,
1,dfuhsdjufio,,,
2,day,temperature,windspeed,event
3,1/1/2017,32,6,Rain
4,1/4/2017,not available,9,Sunny


# Reading a csv file using skiprows and header parameters

In [16]:
weather_df = pd.read_csv("data/weather_data.csv", skiprows=2)
weather_df.head()

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32,6,Rain
1,1/4/2017,not available,9,Sunny
2,1/5/2017,-1,not measured,Snow
3,1/6/2017,not available,7,no event
4,1/7/2017,32,not measured,Rain


In [17]:
weather_df = pd.read_csv("data/weather_data.csv", header=2)
weather_df.head()

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32,6,Rain
1,1/4/2017,not available,9,Sunny
2,1/5/2017,-1,not measured,Snow
3,1/6/2017,not available,7,no event
4,1/7/2017,32,not measured,Rain


# Reading a csv file without header and giving names to the columns

In [18]:
weather_df_header = pd.read_csv("data/weather_data.csv", skiprows=3, header=None , names=['date',"temperature","windspeed","event"])
weather_df_header.head()

Unnamed: 0,date,temperature,windspeed,event
0,1/1/2017,32,6,Rain
1,1/4/2017,not available,9,Sunny
2,1/5/2017,-1,not measured,Snow
3,1/6/2017,not available,7,no event
4,1/7/2017,32,not measured,Rain


# Read limited data from a csv file using nrows parameters

In [19]:
weather_df_header = pd.read_csv("data/weather_data.csv", skiprows=3, header=None , names=['date',"temperature","windspeed","event"],nrows=8)
weather_df_header

Unnamed: 0,date,temperature,windspeed,event
0,1/1/2017,32,6,Rain
1,1/4/2017,not available,9,Sunny
2,1/5/2017,-1,not measured,Snow
3,1/6/2017,not available,7,no event
4,1/7/2017,32,not measured,Rain
5,1/8/2017,not available,not measured,Sunny
6,1/9/2017,not available,not measured,no event
7,1/10/2017,34,8,Cloudy


In [20]:
weather_df_header = pd.read_csv("data/weather_data.csv", skiprows=3, header=None , names=['date',"temperature","windspeed","event"],nrows=5)
weather_df_header

Unnamed: 0,date,temperature,windspeed,event
0,1/1/2017,32,6,Rain
1,1/4/2017,not available,9,Sunny
2,1/5/2017,-1,not measured,Snow
3,1/6/2017,not available,7,no event
4,1/7/2017,32,not measured,Rain


# Reading csv files with na_values parameters ('weather_data.csv' file)

In [21]:
weather_df_header = pd.read_csv("data/weather_data.csv", skiprows=3, header=None , names=['date',"temperature","windspeed","event"],nrows=5,na_values=["not available","not measured","no event"])
weather_df_header

Unnamed: 0,date,temperature,windspeed,event
0,1/1/2017,32.0,6.0,Rain
1,1/4/2017,,9.0,Sunny
2,1/5/2017,-1.0,,Snow
3,1/6/2017,,7.0,
4,1/7/2017,32.0,,Rain


# Write a pandas dataframe to a csv file
1. We can write a pandas dataframe to a csv file using .to_csv() method.
2. You can specify any name to the csv file while writing a pandas dataframe into a csv file.

In [22]:
weather_df_header.to_csv("weather_nan_data.csv", index=False)

In [23]:
nan_df = pd.read_csv("weather_nan_data.csv")
nan_df

Unnamed: 0,date,temperature,windspeed,event
0,1/1/2017,32.0,6.0,Rain
1,1/4/2017,,9.0,Sunny
2,1/5/2017,-1.0,,Snow
3,1/6/2017,,7.0,
4,1/7/2017,32.0,,Rain


### 6. Pandas Dataframe From Xcel files
* We can load an excel file with .xlsx extension and create a dataframe out of the data present inside an excel file using pandas.
* We have .read_excel() method to read a csv file and create a pandas dataframe from the dataset.
* We also need to install openpyxl for working with excel files.

In [24]:
! pip install openpyxl




[notice] A new release of pip available: 22.3.1 -> 24.0
[notice] To update, run: python.exe -m pip install --upgrade pip


In [25]:
import pandas as pd

In [26]:
xl_df = pd.read_excel("data/weather_data.xlsx")

ImportError: Missing optional dependency 'openpyxl'.  Use pip or conda to install openpyxl.

In [None]:
xl_df

In [None]:
type(xl_df)

In [None]:
xl_df.columns

In [None]:
xl_df.reset_index(drop=True, inplace=True)

In [None]:
xl_df

In [None]:
# to remove column= "unnamed"
xl_df.drop(columns={"Unnamed: 0"} , inplace=True)

In [None]:
xl_df

# Writing to an excel file
* We can write a pandas dataframe into a excel file using .to_excel() method.

In [27]:
xl_df.to_excel("weather_nan_data.xlsx", index= False)

NameError: name 'xl_df' is not defined

In [28]:
# reading weather_nan_data.xlsx
df = pd.read_excel("weather_nan_data.xlsx")
df.head()

ImportError: Missing optional dependency 'openpyxl'.  Use pip or conda to install openpyxl.

#### Using head() and tail() method to see top 5 and last 5 rows
<li>To view the first few rows of our dataframe, we can use the DataFrame.head() method.</li>
<li>By default, it returns the first five rows of our dataframe.</li>
<li>However, it also accepts an optional integer parameter, which specifies the number of rows.</li>

<li>Similarly, to view the last few rows of our dataframe, we can use the DataFrame.tail() method.</li>
<li>By default, it returns the last five rows of our dataframe.</li>
<li>However, it also accepts an optional integer parameter, which specifies the number of rows.</li>

#### Question:

<li>Use the head() method to select the first 6 rows.</li>
<li>Use the tail() method to select the last 8 rows.</li>

In [29]:
# Use the head() method to select the first 6 rows.
df = pd.read_excel("weather_nan_data.xlsx")
df.head(6)

ImportError: Missing optional dependency 'openpyxl'.  Use pip or conda to install openpyxl.

In [None]:
# Use the tail() method to select the last 8 rows.
df.tail(8)

#### Finding the column names from the dataframe
<li>We have df.columns attributes to check the name of columns in the pandas dataframe.</li>
<li>Similarly, we have df.values attributes to check the data present in the pandas dataframe.</li>
<li>Check columns type , slicing, values, type of values,shape, dimension, print no event,not measured,not available </li>

In [None]:
# We have df.columns attributes to check the name of columns in the pandas dataframe.
df.columns

In [None]:
# we have df.values attributes to check the data present in the pandas dataframe.
df.values

In [None]:
# Check columns type , slicing, values, type of values,shape, dimension, print no event,not measured,not available
print("Checking column type:")
print(f"Type of column is {type(df.columns)}")
print("_______________________________________")
print("Checking the type of  values:")
print(f"Type of values is {type(df.values)}")
print("_______________________________________")
print("Checking the shape of  DataFrame:")
print(f"The shape of DataFrame:{df.shape}")
print("_______________________________________")
print("Checking the dimension of  DataFrame:")
print(f"The dimension of DataFrame:{df.ndim}")
print("_______________________________________")

In [None]:
# slicing
df.loc[:,["day","temperature"]]

In [None]:
df.loc[:,"day":"event"]

In [None]:
df.iloc[:,0:3]

In [None]:
#  print no event,not measured,not available
df = pd.read_csv("data/weather_data.csv", skiprows=2)
df.head()

#### Checking the type of your dataframe 
<li>Another feature that makes pandas better for working with data is that dataframes can contain more than one data type.</li>
<li>Axis values can have string labels, not just numeric ones.</li>
<li>Dataframes can contain columns with multiple data types: including integer, float, and string.</li>
<li>We can use the DataFrame.dtypes attribute (similar to NumPy) to return information about the types of each column.</li>
<li>When we import data, pandas attempts to guess the correct dtype for each column.</li>
<li>Generally, pandas does well with this, which means we don't need to worry about specifying dtypes every time we start to work with data.</li>



In [None]:
df.dtypes

#### Datatypes Information
<li>We can get the shape of the dataset using <b>.shape()</b> method.</li>
<li><b>.shape()</b> method returns the tuple datatype containing the number of rows and number of columns in the dataset.</li>
<li>If we wanted an overview of all the dtypes used in our dataframe, we can use <b>.info()</b> method.</li>
<li>Note that <b>DataFrame.info()</b> prints the information, rather than returning it, so we can't assign it to a variable.</li>


In [None]:
# .shape
df.shape

In [None]:
# .info()
df.info()

In [None]:
# reading nan weather data
n_df = pd.read_csv("weather_nan_data.csv")
n_df

In [None]:
# shape of nan weather data
n_df.shape

In [None]:
# .info()
n_df.info()

#### Checking the null values in the pandas dataframe

In [None]:
# checking null values using .isnull()
n_df.isnull()

In [None]:
# checking null values using .isna()
n_df.isna()


In [None]:
# find the number of null values in DataFrame
n_df.isna().sum()

# here, columns temperature and windspeed has two missing values , event has one missing values where there is no missing values in date

# set_index() and reset_index() method

In [None]:
# set_index()
n_df.set_index(keys=['date','event'])

# here, set_index()  methed are use to set index by providing keys parameters(keys parameter should contains in DataFrame.columns)

In [None]:
n_df.set_index(keys='index') 

In [None]:
# reset_index()
n_df


In [None]:
n_df.set_index(keys='date',inplace=True)
n_df

In [None]:
# resetting the index
n_df.reset_index(drop=True, inplace=True)
n_df

#### Selecting a column from a pandas DataFrame

<li>Since our axis in pandas have labels, we can select data using those labels.</li> 
<li>Unlike in NumPy, we donot need to know the exact index location of a pandas dataframe.</li>
<li>To do this, we can use the DataFrame.loc[] attribute. The syntax for DataFrame.loc[] is:</li>
<code>
df.loc[row_label, column_label]
</code>

<li>We can use the following shortcut to select a single column:</li>
<code>
df["column_name"]
</code>

<li>This style of selecting columns is very common.</li>


In [None]:
weather_df = pd.read_csv("data/weather_data.csv", names=['day','temperature','windspeed','event'])
weather_df

In [None]:
weather_df.dropna(inplace=True)

In [None]:
weather_df.drop(index=2, inplace=True)
weather_df

In [None]:
weather_df.reset_index(drop=True, inplace=True)
weather_df.head()

In [None]:
weather_df.loc[:,"event"]

In [None]:
weather_df["event"]

In [None]:
weather_df.iloc[:,3:]

#### Questions

<li>Read <b>'appointment_schedule.csv'</b> file using pandas.</li>
<li>Select the <b>'name'</b> column from the given dataset and store to <b>'appointment_names'</b> variable.</li>
<li>Use Python's <b>type()</b> function to assign the type of name column to <b>name_type</b>.</li>

In [None]:
# Read 'appointment_schedule.csv' file using pandas.
schedule_df = pd.read_csv("data/appointment_schedule.csv")
schedule_df.head()

In [None]:
# Select the 'name' column from the given dataset and store to 'appointment_names' variable.
schedule_name = schedule_df["name"]
schedule_name

In [None]:
# Use Python's type() function to assign the type of name column to name_type.
print(f"The type of schedule name : {type(schedule_name)}")

#### Pandas Series
<li>Series is the pandas type for one-dimensional objects.</li>
<li>Anytime you see a 1D pandas object, it will be a series. Anytime you see a 2D pandas object, it will be a dataframe.</li>
<li>A dataframe is a collection of series objects, which is similar to how pandas stores the data behind the scenes.</li>

# Adding a column in a pandas dataframe

In [None]:
n_df

In [None]:
import numpy as np
n_df.insert(loc=0, column='id', value=np.array([1, 2, 3, 4, 5]))

### Selecting Multiple Columns From the DataFrame

![](images/selecting_columns.png)

<li>We can select multiple columns from the dataframe by using the following codes:</li>
<code>
    df.loc[:, ["col1", "col2"]]
</code>

<li>We can use syntax shortcuts for selecting multiple columns by using the following syntax:</li>
<code>
    df[["col1", "col2"]]
</code>

In [None]:
car_detail = pd.read_csv("data/car_details.csv")
car_detail.head()

In [None]:
car_detail.loc[:,["fuel","owner"]]

In [None]:
car_detail[['fuel','owner']]

#### Selecting Rows From A Pandas DataFrame

<li>Now that we've learned how to select columns by label, let's learn how to select rows using the labels of the index axis.</li>
<li>We can use the same syntax to select rows from a dataframe as we do for columns:</li>
<code>
    df.loc[row_label, column_label]

#### Indexing & Slicing In Pandas DataFrame

<li>We can slice a dataset from their rows as well as columns.</li>
<li>If we have (5,5) shape data and we want first three rows and first three columns then we need to slice both rows and columns to get a desired shape.</li>
<li>We have df.iloc() method which we can use to do indexing as well as slicing in a dataframe.</li>
<li>Let's practice .iloc() method.</li>


In [None]:
car_detail.iloc[:,1:4]

#### Datatype Conversion In Pandas

<li>Pandas astype() is the one of the most important methods. It is used to change data type of a series.</li>
<li>When a pandas dataframe is created from a csv file,the data type is set automatically.</li>
<li>The datatype will not be what it actually should be at times and this is where we can use astype()  to get desired datatype.</li>
<li>For example, a salary column could be imported as string but to do operations we have to convert it into float.</li>
<li>astype() is used to do such data type conversions.</li>

In [None]:
car_detail["selling_price"].dtype

In [None]:
selling_price = car_detail["selling_price"]
s_p = selling_price.astype(dtype=float)

#### Value Counts Method

<li>Since series and dataframes are two distinct objects, they have their own unique methods.</li>

<li>Let's look at an example of a series method - the Series.value_counts() method.</li>

<li>This method displays each unique non-null value in a column and their counts in order.</li>

<li>value_counts() is a series only method, we get the following error if we try to use it for dataframes:</li>

<code>
    AttributeError: 'DataFrame' object has no attribute 'value_counts'

# Creating a frequency table from value_counts

In [None]:
s_p.value_counts()

# Renaming the column names in a pandas dataframe

In [None]:
car_detail.rename(columns ={"selling_price":"s_p"})

#### Selecting Items From A Series Method

<li>As with dataframes, we can use Series.loc[] to select items from a series using single labels, a list, or a slice object.</li>
<li>We can also omit loc[] and use bracket shortcuts for all three:</li>

In [None]:
s_p.loc[1:5]

In [None]:
s_p.iloc[:5]

#### Question

<li>Use the value counts method to check the frequency count of different names from 'appointment_schedule.csv' file.</li>
<li>Select only first row from the series.</li>
<li>Select the first row and the last row from the series.</li>
<li>Select the first five rows and the last five rows from the series.</li>



In [31]:
schedule_df = pd.read_csv("data/appointment_schedule.csv")
schedule_df.head()

Unnamed: 0,name,appointment_made_date,app_start_date,app_end_date,visitee_namelast,visitee_namefirst,meeting_room,description
0,Joshua T. Blanton,2014-12-18T00:00:00,1/6/15 9:30,1/6/15 23:59,,potus,west wing,JointService Military Honor Guard
1,Jack T. Gutting,2014-12-18T00:00:00,1/6/15 9:30,1/6/15 23:59,,potus,west wing,JointService Military Honor Guard
2,Bradley T. Guiles,2014-12-18T00:00:00,1/6/15 9:30,1/6/15 23:59,,potus,west wing,JointService Military Honor Guard
3,Loryn F. Grieb,2014-12-18T00:00:00,1/6/15 9:30,1/6/15 23:59,,potus,west wing,JointService Military Honor Guard
4,Travis D. Gordon,2014-12-18T00:00:00,1/6/15 9:30,1/6/15 23:59,,potus,west wing,JointService Military Honor Guard


# Use the value counts method to check the frequency count of different names from 'appointment_schedule.csv' file.

In [32]:
schedule_df['name'].describe(include="object")

count                    585
unique                   542
top       Jesus MurilloKaram
freq                       3
Name: name, dtype: object

In [33]:
schedule_df['name'].value_counts()

name
Jesus MurilloKaram            3
Michael A. Marr               2
JoseAntonio MeadeKuribrena    2
Todd S. Mizis                 2
Kieffer T. Elkins             2
                             ..
Anthony J. Falsone            1
Robert C. Buford              1
Philip Coles                  1
Kristopher L. Davis           1
Joseph A. Pritchard           1
Name: count, Length: 542, dtype: int64

# Select only first row from the series.

In [34]:
schedule_df['name'].loc[0]

'Joshua T. Blanton'

# Select the first row and the last row from the series.

In [35]:
schedule_df['name'].loc[0:len(schedule_df['name']):len(schedule_df['name'])-1]

0      Joshua T. Blanton
584      Martin O. Reina
Name: name, dtype: object

# Select the first five rows and the last five rows from the series.

In [36]:
# first_five
schedule_df['name'].head(5)

0    Joshua T. Blanton
1      Jack T. Gutting
2    Bradley T. Guiles
3       Loryn F. Grieb
4     Travis D. Gordon
Name: name, dtype: object

In [37]:
# last_five
schedule_df['name'].tail(5)

580         Ryan J. Morgan
581    Alexander V. Nevsky
582     Montana J. Johnson
583    Joseph A. Pritchard
584        Martin O. Reina
Name: name, dtype: object

In [45]:
# another approach

step=1
for index in range(len(schedule_df['name'])):
    if index == 5:
        step = len(schedule_df)-5
        for i in range(step,len(schedule_df)):
            print(schedule_df['name'].loc[i])
        break
    else:
        print(schedule_df['name'].loc[index])


Joshua T. Blanton
Jack T. Gutting
Bradley T. Guiles
Loryn F. Grieb
Travis D. Gordon
Ryan J. Morgan
Alexander V. Nevsky
Montana J. Johnson
Joseph A. Pritchard
Martin O. Reina


#### DataFrame Vs DataSeries
#### Vecotrized Operations In Pandas

<li>We'll explore how pandas uses many of the concepts we learned in the NumPy.</li>
<li>Because pandas is designed to operate like NumPy, a lot of concepts and methods from Numpy are supported.</li>
<li>Recall that one of the ways NumPy makes working with data easier is with vectorized operations.</li>
<li>Just like with NumPy, we can use any of the standard Python numeric operators with series, including:</li>
<code>
    series_a + series_b - Addition
    series_a - series_b - Subtraction
    series_a * series_b - Multiplication
    series_a / series_b - Division
</code>

#### Some Statistical Functions In Pandas

<li>Like NumPy, Pandas supports many descriptive stats methods such as mean, median, mode, min, max and so on.</li>
<li>Here are a few of the most useful ones.</li>
<code>
Series.max()
Series.min()
Series.mean()
Series.median()
Series.mode()
Series.sum()
</code>
<li>We can calculate the average value of a particular column(series) using df.column_name.mean().</li>
<li>For calculating the minimum value in a particular column(series), we can use df.column_name.min().</li>
<li>Similarly, for calculating the maximum value in a particular column(series), we can use df.column_name.max().</li>

#### Finding the descriptive statistics of the dataframe using .describe() method

<li>Descriptive statistics include those that summarize the central tendency, dispersion and shape of a dataset's distribution, excluding NaN values.</li>
<li>describe() method in Pandas is used to compute descriptive statistics for all of your numeric columns.</li>
<li>Analyzes both numeric and object series, as well as DataFrame column sets of mixed data types.</li>
<li>The output will vary depending on what is provided.</li>
<li>If we want to see the descriptive statistics of an object datatype then we have to specify <b>df.describe(include = "O")</b></li>

In [None]:
schedule_df.describe()

In [49]:
schedule_df.describe(include = "O")

Unnamed: 0,name,appointment_made_date,app_start_date,app_end_date,visitee_namelast,visitee_namefirst,meeting_room,description
count,585,585,585,585,56,585,585,213
unique,542,11,23,9,5,6,13,9
top,Jesus MurilloKaram,2015-01-09T00:00:00,1/12/15 13:00,1/12/15 23:59,/,POTUS,State Floo,JointService Military Honor Guard
freq,3,247,217,286,36,376,279,95
