<a href="https://colab.research.google.com/github/ShilpaVasista/Exploratory-Data-Analytics/blob/main/Module_1_Interactive_Lesson_3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## 1. Setting Up Pandas:

To work with Pandas, you'll need to import it and set some default parameters for better data display.

In [None]:
import numpy as np
import pandas as pd
print("Pandas Version:", pd.__version__)
pd.set_option('display.max_columns', 500)
pd.set_option('display.max_rows', 500)

Exercise:



1. Run the above code and confirm your Pandas version.


2. Set the display.max_columns and display.max_rows to a higher number if needed to view large datasets.

## 2. Creating DataFrames:

You can create DataFrames in multiple ways, such as from Series, dictionaries, or N-dimensional arrays.

From a Series:

In [None]:
series = pd.Series([2, 3, 7, 11, 13, 17, 19, 23])
print(series)
# Creating dataframe from Series
series_df = pd.DataFrame({
   'A': range(1, 5),
   'B': pd.Timestamp('20190526'),
   'C': pd.Series(5, index=list(range(4)), dtype='float64'),
   'D': np.array([3] * 4, dtype='int64'),
   'E': pd.Categorical(["Depression", "Social Anxiety", "Bipolar Disorder", "Eating Disorder"]),
   'F': 'Mental health',
   'G': 'is challenging'
})
print(series_df)

From a Dictionary:

In [None]:
dict_df = [{'A': 'Apple', 'B': 'Ball'}, {'A': 'Aeroplane', 'B': 'Bat', 'C': 'Cat'}]
dict_df = pd.DataFrame(dict_df)
print(dict_df)

From N-dimensional Arrays:

In [None]:
sdf = {
   'County': ['Østfold', 'Hordaland', 'Oslo', 'Hedmark', 'Oppland', 'Buskerud'],
   'ISO-Code': [1, 2, 3, 4, 5, 6],
   'Area': [4180.69, 4917.94, 454.07, 27397.76, 25192.10, 14910.94],
   'Administrative centre': ["Sarpsborg", "Oslo", "City of Oslo", "Hamar", "Lillehammer", "Drammen"]
}
sdf = pd.DataFrame(sdf)
print(sdf)

Exercise:



1. Create a DataFrame using a list of numbers.



2. Convert the Series into a DataFrame with multiple columns as shown in the code example.

## 3. Loading Data from a CSV File:

You can load a dataset from an external source into a pandas DataFrame.

In [None]:
columns = ['age', 'workclass', 'fnlwgt', 'education', 'education_num', 'marital_status', 'occupation',
           'relationship', 'ethnicity', 'gender', 'capital_gain', 'capital_loss', 'hours_per_week',
           'country_of_origin', 'income']
df = pd.read_csv('http://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data', names=columns)
print(df.head(10))

Exercise:

Load your own dataset using pd.read_csv() and check the first 10 rows using .head(10).


## 4. Inspecting DataFrames:

You can view information about the DataFrame, such as column names, data types, and memory usage.

In [None]:
df.info()

Exercise:

Use df.info() to inspect the DataFrame you loaded.

## 5. Selecting Rows and Columns:

You can select specific rows and columns in a DataFrame using the .iloc[] method.

In [None]:
df.iloc[10]        # Selects a specific row
df.iloc[0:10]      # Selects multiple rows
df.iloc[10:15]     # Selects a range of rows
df.iloc[-2:]       # Selects the last 2 rows
df.iloc[::2, 3:5]  # Selects every other row in columns 3-5

Exercise:

Use .iloc[] to select specific rows and columns from your DataFrame.

## 6. Combining NumPy with Pandas:

You can combine NumPy arrays with Pandas DataFrames.

In [None]:
np.random.seed(24)
dFrame = pd.DataFrame({'F': np.linspace(1, 10, 10)})
dFrame = pd.concat([df, pd.DataFrame(np.random.randn(10, 5), columns=list('EDCBA'))], axis=1)
dFrame.iloc[0, 2] = np.nan
print(dFrame)

Exercise:

Create a DataFrame using np.random.randn() and concatenate it with an existing DataFrame.

## 7. Styling DataFrames:

You can apply custom styles to highlight values based on conditions.

In [None]:
def colorNegativeValueToRed(value):
    if value < 0:
        color = 'red'
    elif value > 0:
        color = 'black'
    else:
        color = 'green'
    return 'color: %s' % color

s = df.style.applymap(colorNegativeValueToRed, subset=['A', 'B', 'C', 'D', 'E'])
s

Exercise:

Apply styling to your DataFrame, changing colors based on positive, negative, and zero values.

## 8. Highlighting Maximum and Minimum Values:

You can highlight the max and min values in each column.

In [None]:
def highlightMax(s):
    isMax = s == s.max()
    return ['background-color: orange' if v else '' for v in isMax]

def highlightMin(s):
    isMin = s == s.min()
    return ['background-color: green' if v else '' for v in isMin]

df.style.apply(highlightMax).apply(highlightMin).highlight_null(null_color='red')

Exercise:

Apply the highlightMax() and highlightMin() functions to highlight maximum and minimum values in your DataFrame.

## 9. Gradient Background Styling:

You can use Seaborn to apply a gradient color map to the DataFrame.

In [None]:
import seaborn as sns
colorMap = sns.light_palette("pink", as_cmap=True)
styled = df.style.background_gradient(cmap=colorMap)
styled

Exercise:

Apply a gradient color to your DataFrame using Seaborn.