# 1_pandas_introduction.py

This notebook was automatically converted from a Python script.

# Pandas: Introduction


Pandas is a powerful open-source data analysis and manipulation library for Python. It provides data structures for efficiently storing and manipulating tabular data, time series, and matrix data. Pandas is built on top of NumPy and is an essential tool for data analysis in Python.


## Why Use Pandas?


- Fast and efficient data manipulation with labeled axes


- Handles missing data gracefully


- Easy data alignment and integration


- Powerful grouping and aggregation capabilities


- Excel-like data analysis with Python


- Built-in data visualization


- Great for time series analysis


Import pandas and numpy libraries


In [None]:
import pandas as pd
import numpy as np



Check version


In [None]:
print(f"Pandas version: {pd.__version__}")
print(f"NumPy version: {np.__version__}")



## Key Pandas Objects


Pandas has two main data structures:


1. **Series**: One-dimensional array with labeled indexes


2. **DataFrame**: Two-dimensional table with rows and columns (similar to a spreadsheet)


Let's create simple examples of each:


Create a simple Series


In [None]:
s = pd.Series([10, 20, 30, 40], index=['a', 'b', 'c', 'd'])
print("Series example:")
print(s)



Create a simple DataFrame


In [None]:
data = {
    'Name': ['Ali', 'Ayşe', 'Mehmet', 'Zeynep'],
    'Age': [25, 30, 35, 28],
    'City': ['Istanbul', 'Ankara', 'Izmir', 'Bursa']
}

df = pd.DataFrame(data)
print("\nDataFrame example:")
print(df)



## Basic Operations


Let's look at some common operations in Pandas:


View the first rows of DataFrame


In [None]:
print("First 2 rows:")
print(df.head(2))



View basic statistics


In [None]:
print("\nBasic statistics:")
print(df.describe())



Get information about DataFrame


In [None]:
print("\nDataFrame info:")
df.info()



## Typical Pandas Workflow


A typical workflow using pandas generally includes:


1. **Loading data**: Import data from CSV, Excel, SQL, JSON, etc.


2. **Exploring data**: Look at sample rows, check statistics, understand the structure


3. **Cleaning data**: Handle missing values, remove duplicates, fix data types


4. **Manipulating data**: Transform, filter, and organize data as needed


5. **Analyzing data**: Perform calculations, grouping, and statistical analysis


6. **Visualizing data**: Create plots and charts to visualize findings


7. **Saving results**: Export processed data or analysis results


Over the next few notebooks, we'll explore each of these steps in detail.


Example: Create a simple dataset


In [None]:
np.random.seed(42)  # For reproducibility
dates = pd.date_range('20230101', periods=6)
df = pd.DataFrame(np.random.randn(6, 4), index=dates, columns=list('ABCD'))

print("Example time-series data:")
print(df)



Simple analysis


In [None]:
print("\nColumn means:")
print(df.mean())



Simple visualization


In [None]:
import matplotlib.pyplot as plt


%matplotlib inline - This is for Jupyter notebooks


In [None]:

plt.figure(figsize=(10, 6))
df.plot(figsize=(10, 6))
plt.title('Random Time Series Data')
plt.ylabel('Value')
plt.grid(True)


plt.show() # Uncomment to show the plot when running as a script


## Conclusion


This script provided a brief introduction to pandas and its capabilities. In the following scripts, we'll explore pandas data structures and operations in more detail:


1. Series: One-dimensional labeled arrays


2. DataFrames: Two-dimensional labeled data structures


3. Reading Data: Importing data from CSV and JSON files


4. Data Analysis: Techniques for exploring and analyzing data


Each script will build on these fundamentals and provide more in-depth examples. 
