## Exercise 1 - Pandas: DataFrame and Series

<p><b>Pandas</b> is a popular library for data analysis built on top of the Python programming language. Pandas generally provide two data structures for manipulating data, They are:</p>

<ul>
    <li>DataFrame</li>
    <li>Series</li>
</ul>

<p>A <code>DataFrame</code> is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in rows and columns.</p>

<ul>
    <li>A Pandas DataFrame will be created by loading the datasets from existing storage.</li>
    <li>Storage can be SQL Database, CSV file, Excel file, etc.</li>
    <li>It can also be created from the lists, dictionaries, and from a list of dictionaries.</li>
</ul>

<p><code>Series</code> represents a one-dimensional array of indexed data.
It has two main components:</p>

<ol>
    <li>An array of actual data.</li>
    <li>An associated array of indexes or data labels.</li>
</ol>

<p>The index is used to access individual data values. You can also get a column of a dataframe as a <b>Series</b>. You can think of a Pandas series as a 1-D dataframe. </p>

In [1]:
import pandas as pd

<p>Once you’ve imported pandas, you can then use the functions built in it to create and analyze data.</p>

<p><b>In this practice lab, we will learn how to create a DataFrame out of a dictionary.</b></p>

<p>Let us consider a dictionary "x" with keys and values as shown below.</p>
<p>We then create a dataframe from the dictionary using the function <code>pd.DataFrame</code>(dict).</p>

In [2]:
x = {
    "Name": ["Rose", "John", "Jane", "Mary"],
    "ID": [1, 2, 3, 4],
    "Department": ["Architect Group", "Software Group", "Design Team", "Infrastructure"],
    "Salary": [100000, 80000, 50000, 60000]
}
x

{'Name': ['Rose', 'John', 'Jane', 'Mary'],
 'ID': [1, 2, 3, 4],
 'Department': ['Architect Group',
  'Software Group',
  'Design Team',
  'Infrastructure'],
 'Salary': [100000, 80000, 50000, 60000]}

In [3]:
df = pd.DataFrame(x)
df

Unnamed: 0,Name,ID,Department,Salary
0,Rose,1,Architect Group,100000
1,John,2,Software Group,80000
2,Jane,3,Design Team,50000
3,Mary,4,Infrastructure,60000


<p>We can see the direct correspondence between the table. The keys correspond to the column labels and the values or lists correspond to the rows.</p>

### Column Selection

<p>To select a column in Pandas DataFrame, we can either access the columns by calling them by their columns name. </p>
<p>Let's retrive the data present in the <code>ID</code> column.</p>

In [4]:
col_id = df[["ID"]]
col_id

Unnamed: 0,ID
0,1
1,2
2,3
3,4


<p>Let's use the <code>type()</code> function and check the type of the variable.</p>

In [5]:
type(col_id)

pandas.core.frame.DataFrame

<p>The output shows us that the type of the variable is a DataFrame object.</p>

### Access to multiple columns

<p>Now let us retrieve the data for <code>Department</code>, <code>Salary</code> and <code>ID</code> columns.</p>

In [6]:
col_3 = df[["Department", "Salary", "ID"]]
col_3

Unnamed: 0,Department,Salary,ID
0,Architect Group,100000,1
1,Software Group,80000,2
2,Design Team,50000,3
3,Infrastructure,60000,4


### Try it yourself

<p>Problem 1: Create a dataframe to display the result as below:</p>

<table>
    <thead>
    <tr>
        <th style="border: 1px solid; text-align: center">No.</th>
        <th style="border: 1px solid; text-align: center">Student</th>
        <th style="border: 1px solid; text-align: center">Age</th>
        <th style="border: 1px solid; text-align: center">Country</th>
        <th style="border: 1px solid; text-align: center">Course</th>
        <th style="border: 1px solid; text-align: center">Marks</th>
    </tr>
    </thead>
    <tbody>
    <tr>
        <td style="border: 1px solid; text-align: center">0</td>
        <td style="border: 1px solid; text-align: center">David</td>
        <td style="border: 1px solid; text-align: center">27</td>
        <td style="border: 1px solid; text-align: center">UK</td>
        <td style="border: 1px solid; text-align: center">Python</td>
        <td style="border: 1px solid; text-align: center">85</td>
    </tr>
    <tr>
        <td style="border: 1px solid; text-align: center">1</td>
        <td style="border: 1px solid; text-align: center">Samuel</td>
        <td style="border: 1px solid; text-align: center">24</td>
        <td style="border: 1px solid; text-align: center">Canada</td>
        <td style="border: 1px solid; text-align: center">Data Structures</td>
        <td style="border: 1px solid; text-align: center">72</td>
    </tr>
    <tr>
        <td style="border: 1px solid; text-align: center">2</td>
        <td style="border: 1px solid; text-align: center">Terry</td>
        <td style="border: 1px solid; text-align: center">22</td>
        <td style="border: 1px solid; text-align: center">China</td>
        <td style="border: 1px solid; text-align: center">Machine Learning</td>
        <td style="border: 1px solid; text-align: center">89</td>
    </tr>
    <tr>
        <td style="border: 1px solid; text-align: center">3</td>
        <td style="border: 1px solid; text-align: center">Evan</td>
        <td style="border: 1px solid; text-align: center">32</td>
        <td style="border: 1px solid; text-align: center">USA</td>
        <td style="border: 1px solid; text-align: center">Web Development</td>
        <td style="border: 1px solid; text-align: center">76</td>
    </tr>
    </tbody>
</table>

In [7]:
members = {
    "Student": ["David", "Samuel", "Terry", "Evan"],
    "Age": [27, 24, 22, 32],
    "Country": ["UK", "Canada", "China", "USA"],
    "Course": ["Python", "Data Structures", "Machine Learning", "Web Development"],
    "Marks": [85, 72, 89, 76]
}
members

{'Student': ['David', 'Samuel', 'Terry', 'Evan'],
 'Age': [27, 24, 22, 32],
 'Country': ['UK', 'Canada', 'China', 'USA'],
 'Course': ['Python',
  'Data Structures',
  'Machine Learning',
  'Web Development'],
 'Marks': [85, 72, 89, 76]}

In [8]:
mb_df = pd.DataFrame(members)
mb_df

Unnamed: 0,Student,Age,Country,Course,Marks
0,David,27,UK,Python,85
1,Samuel,24,Canada,Data Structures,72
2,Terry,22,China,Machine Learning,89
3,Evan,32,USA,Web Development,76


<p>Problem 2: Retrieve the Marks column and assign it to a variable b</p>

In [9]:
b = mb_df[["Marks"]]
b

Unnamed: 0,Marks
0,85
1,72
2,89
3,76


<p>Problem 3: Retrieve the Country and Course columns and assign it to a variable c</p>

In [10]:
c = mb_df[["Country", "Course"]]
c

Unnamed: 0,Country,Course
0,UK,Python
1,Canada,Data Structures
2,China,Machine Learning
3,USA,Web Development


<p>To view the column as a series, just use one bracket:</p>

In [11]:
a = mb_df["Student"]
a

0     David
1    Samuel
2     Terry
3      Evan
Name: Student, dtype: object

In [12]:
type(a)

pandas.core.series.Series

<p>The output shows us that the type of the variable is a Series object.</p>

## Exercise 2 - `loc()` and `iloc()` functions

<p><code>loc()</code> is a label-based data selecting method which means that we have to pass the name of the row or column that we want to select. This method includes the last element of the range passed in it.</p>

<p>Simple syntax for your understanding:</p>

<ul><li>loc[row_label, column_label]</li></ul>

<p><code>iloc()</code> is an indexed-based selecting method which means that we have to pass an integer index in the method to select a specific row/column. This method does not include the last element of the range passed in it.</p>

<p>Simple syntax for your understanding:</p>

<ul><li>iloc[row_index, column_index]</li></ul>

<p>Let us see some examples on the same.</p>

In [13]:
# Access the value on the first row and the first column
mb_df.iloc[0, 0]

'David'

In [14]:
# Access the value on the first row and the third column
mb_df.iloc[0, 2]

'UK'

In [15]:
# Access the column using the name
mb_df.loc[0, "Marks"]

np.int64(85)

<p>Let us create a new dataframe called "df2" and assign "df" to it. Now, let us set the "Name" column as an index column using the method <code>set_index()</code>.</p>

In [16]:
df2 = df
df2 = df2.set_index("Name")

### Try it yourself

<p>Use the <code>loc()</code> function, to get the Department of Jane in the newly created dataframe df2.</p>

In [17]:
df2.loc["Jane", "Department"]

'Design Team'

<p>Use the <code>iloc()</code> function to get the Salary of Mary in the newly created dataframe df2.</p>

In [18]:
df2.iloc[3, 2]

np.int64(60000)

## Exercise 3 - Slicing

<p>Slicing uses the <code>[]</code> operator to select a set of rows and/or columns from a DataFrame.</p>

<p>To slice out a set of rows, you use this syntax: <code>data[start:stop]</code>, here the start represents the index from where to consider, and stop represents the index one step BEYOND the row you want to select. You can perform slicing using both the index and the name of the column.</p>

> NOTE: When slicing in pandas, the start bound is included in the output.

<p>So if you want to select rows 0, 1, and 2 your code would look like this: <code>df.iloc[0:3]</code>.</p>

<p>It means you are telling Python to start at index 0 and select rows 0, 1, 2 up to but not including 3.</p>

> NOTE: Labels must be found in the DataFrame or you will get a KeyError.

<p>Indexing by labels(i.e. using <code>loc()</code>) differs from indexing by integers (i.e. using <code>iloc()</code>). With <code>loc()</code>, both the start bound and the stop bound are inclusive. When using <code>loc()</code>, integers can be used, but the integers refer to the index label and not the position.</p>

<p>For example, using <code>loc()</code> and select 1:4 will get a different result than using <code>iloc()</code> to select rows 1:4.</p>

<p>We can also select a specific data value using a row and column location within the DataFrame and <code>iloc</code> indexing.</p>

In [19]:
# let us do the slicing using old dataframe df
df.iloc[0: 2, 0: 3]

Unnamed: 0,Name,ID,Department
0,Rose,1,Architect Group
1,John,2,Software Group


In [20]:
# let us do the slicing using loc() function on old dataframe df where index column is having labels as 0,1,2
df.loc[0: 2, "ID": "Department"]

Unnamed: 0,ID,Department
0,1,Architect Group
1,2,Software Group
2,3,Design Team


In [21]:
# let us do the slicing using loc() function on new dataframe df2 where index column is Name having labels: Rose, John and Jane
df2.loc["Rose": "Jane", "ID": "Department"]

Unnamed: 0_level_0,ID,Department
Name,Unnamed: 1_level_1,Unnamed: 2_level_1
Rose,1,Architect Group
John,2,Software Group
Jane,3,Design Team


### Try it yourself

<p>using <code>loc()</code> function, do slicing on old dataframe df to retrieve the Name, ID and department of index column having labels as 2,3.</p>

In [22]:
df.loc[2: 3, "Name": "Department"]

Unnamed: 0,Name,ID,Department
2,Jane,3,Design Team
3,Mary,4,Infrastructure


****
This is the end of the file.
****