#**Guided LAB 343.3.5 - Exploratory Data Analysis on CSV data - Basic insights from the Data**
---


## **Lab Overview**

This lab focuses on introducing fundamental data analysis techniques using Python's Pandas library. We'll primarily work with a CSV file containing employee data, exploring various methods to load, manipulate, and gain insights from it.

In this lab, we will demonstrate how to read a CSV file with or without a header, skip rows, skip columns, set columns to index, and many more with examples. And we will perform Exploratory Data Analysis EDA on a CSV file.

**Key Activities:**

1. **Data Loading and Initial Exploration:**  We'll begin by importing the Pandas library and utilizing the `read_csv()` function to load the employee dataset into a Pandas DataFrame. We'll then use methods like `head()`, `tail()`, and `info()` to get an initial overview of the data's structure and content.
2. **Data Handling Techniques:** We will delve into practical data handling strategies, such as skipping rows or selecting specific columns during the data loading process using parameters like `skiprows` and `usecols` with `read_csv()`.
3. **Basic Exploratory Data Analysis (EDA):** We'll perform basic EDA to understand the dataset's characteristics. This includes examining data types, identifying potential missing values, and assessing the overall shape and size of the DataFrame.

**Learning Outcomes:**

By the end of this lab, you will be proficient in:

* Importing and utilizing the Pandas library in Google Colab.
* Loading CSV data into a Pandas DataFrame.
* Applying data manipulation techniques to extract desired information.
* Performing basic EDA to gain insights from datasets.

## **Introduction:**
Use the pandas read_csv() function to read a CSV file (comma-separated) into a Python pandas DataFrame. which supports options to read any delimited file.

##**Dataset:**
In this lab we will utilize the dummy employee dataset.

[Click here to download employee dataset (employee.csv)](https://drive.google.com/file/d/14RV1xKIRzWS166LtGqnPC1Wg7eTlI_y1/view?usp=drive_link)

---

# **Begin**



**Example 1: Reading Data from CSVs**

Note: if you get error, use the line below:
`df = pd.read_csv('employee.csv', on_bad_lines='skip')`



In [None]:
import pandas as pd
df = pd.read_csv('employee.csv')
df


Note: Use the sep or delimiter argument to specify the separator of the columns. By default, it uses a comma.


**Example 2: Viewing or Explore your Data**


The first thing to do when opening a new dataset is to print out a few rows to keep as a visual reference. We accomplish this with .head():





In [None]:
df.head()

.head() outputs the first five rows of your DataFrame by default, but we could also pass a number as well. df.head(10) would output the top ten rows.


In [None]:

df.head(10)


To see the last five rows, use df.tail(), which also accepts a number and prints the bottom two rows in this case.



In [None]:
df.tail(2)

**Example 3: Getting Information About your Data**

.info() should be one of the very first commands you run after loading your data:




In [None]:

df.info()

**.info()** provides the essential details about your dataset such as the number of rows and columns, the number of non-null values, what type of data is in each column, and how much memory your DataFrame is using.

Another fast and useful attribute is .shape, which returns just a tuple of (rows, columns):









In [None]:
df.shape


Note that .shape has no parentheses and is a simple tuple format (rows, columns). So, we have 15 rows and 4 columns in our employeeDataFrame.

**Example 4: Skip Rows**

Sometimes, you may need to skip the first row or skip the footer rows. To do this, use skiprows and skipfooter params, respectively.


   


In [None]:

# Skip first few rows
df = pd.read_csv('employee.csv', header=None, skiprows=5)
print(df)

**Example 5: Load Only Selected Columns**

There are two common ways to use this argument:

**Method 1:** Use usecols with Column Names

Syntax:
`df = pd.read_csv('my_data.csv', usecols=['column name one', 'column name two'])`

**Method 2:** Use usecols with Column Positions

Syntax:
`df = pd.read_csv('my_data.csv', usecols=[0, 2])`






In [None]:
pd.read_csv('employee.csv', usecols=['Name', 'Salary'])

In [None]:
df = pd.read_csv('employee.csv', usecols =[0,3])
print(df)

##**Submission**
- Submit your completed lab using the Start Assignment button on the assignment page in Canvas.
- Your submission can be include:
  - if you are using notebook then, all tasks should be written and submitted in a single notebook file, for example: (**your_name_labname.ipynb**).
  - if you are using python script file, all tasks should be written and submitted in a single python script file for example: **(your_name_labname.py)**.
- Add appropriate comments and any additional instructions if required.
