# Introduction to Data Analytics and Pandas

## What is Data Analytics?

Data analytics is the process of examining, cleaning, transforming, and interpreting data to discover useful information, draw conclusions, and support decision-making. It involves various techniques, tools, and processes to extract insights from raw data.

## Introduction to Pandas

Pandas is a powerful Python library for data manipulation and analysis. It provides data structures like DataFrames and Series, which allow you to work with structured data efficiently. Pandas is widely used in data science and analytics for tasks such as:

- Loading data from various sources (CSV, Excel, databases, etc.)
- Cleaning and preprocessing data
- Performing data transformations
- Analyzing and summarizing data
- Merging and joining datasets

Let's start by importing pandas and loading some example datasets.


In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Load the Titanic dataset
titanic_df = pd.read_csv(
    "https://web.stanford.edu/class/archive/cs/cs109/cs109.1166/stuff/titanic.csv"
)

In [None]:
# Display the first few rows of the dataset
titanic_df.head()

In [None]:
# Get basic information about the dataset
titanic_df.info()

In [None]:
# Summary statistics of numerical columns
titanic_df.describe()

In [None]:
# Example 1: Survival rate by passenger class
survival_rate = titanic_df.groupby("Pclass")["Survived"].mean()
print("Survival rate by passenger class:")
print(survival_rate)

In [None]:
# Visualize survival rate by passenger class
plt.figure(figsize=(10, 6))
survival_rate.plot(kind="bar")
plt.title("Survival Rate by Passenger Class")
plt.xlabel("Passenger Class")
plt.ylabel("Survival Rate")
plt.show()

In [None]:
# Example 2: Age distribution of passengers
plt.figure(figsize=(10, 6))
sns.histplot(data=titanic_df, x="Age", bins=20, kde=True)
plt.title("Age Distribution of Passengers")
plt.xlabel("Age")
plt.ylabel("Count")
plt.show()