# Objective
* This notebook is designed for data science, data analytics, and data engineering professionals and enthusiasts. Understanding Python's core data structures is crucial for efficiently managing, manipulating, and analyzing data. Lists, strings, dictionaries, and arrays form the building blocks for many data-related tasks, from simple analysis to complex data engineering pipelines.

# Core Data Structures in Python

# 1. Sequence Types
Sequence types are data structures that store an ordered collection of elements. The order of elements is significant, meaning that you can access them by their position (index) within the sequence.

The term "**sequence**" emphasizes the linear arrangement of elements, allowing access by index. They support operations that rely on their order, such as slicing and indexing.

* List: An ordered, mutable collection of elements.

* Tuple: An ordered, immutable collection of elements.

* String: A sequence of characters (immutable).

# 2. Mapping Type
A mapping type is a collection of key-value pairs, where each key is unique and maps to a specific value. You can retrieve, insert, or modify values using their associated keys.

The name "mapping" reflects the concept of mapping unique keys to specific values, allowing for efficient data retrieval. It emphasizes the association between keys and values rather than a sequential arrangement.

* Dictionary: An unordered collection of key-value pairs.

# 3. Set Types

Set types are collections of unique elements that are unordered. They support mathematical set operations like union, intersection, and difference.

The term "**set**" is derived from mathematical terminology, where a set is defined as a collection of distinct objects. This name highlights the properties of uniqueness and unordered nature, distinguishing them from other collections.


* An unordered collection of unique elements.

* Frozenset: An immutable version of a set.

# 4. Binary Types
Binary types represent raw binary data. They are used for handling data that isn't easily represented as standard text, such as images, audio files, or other byte-based data.

The name "**binary**" indicates that these types deal specifically with byte data, which consists of 0s and 1s. This term distinguishes them from text-based data structures, focusing on their utility for low-level data handling.

* Bytes: A sequence of bytes (immutable).

* Bytearray: A mutable sequence of bytes.

* Memoryview: A view of the memory of bytes or bytearray objects.

# Summary of Core Structures
* **List**: Mutable sequence, allows duplicates.

* **Tuple:** Immutable sequence, allows duplicates.

* **String:** Immutable sequence of characters.

* **Dictionary:** Mutable mapping, keys must be unique.

* **Set:** Mutable collection of unique elements.

* **Frozenset:** Immutable collection of unique elements.

* **Bytes:** Immutable sequence of bytes.

* **Bytearray:** Mutable sequence of bytes.

* **Memoryview:** A view into the memory of another byte-based object.

# Lists in Python
* Description: Lists are useful for managing datasets where items need to be **ordered, indexed, and modified**. They can store a mix of data types, but are especially handy for manipulating collections of similar data such as rows from a database or data from a CSV file.

In [1]:
# Example: Storing daily sales figures
daily_sales = [1000, 1500, 1200, 1100, 1400]

# Adding a new day's sales
daily_sales.append(1600)
print("Updated sales:", daily_sales)

# Calculating total sales for the week
total_sales = sum(daily_sales)
print("Total sales for the week:", total_sales)


Updated sales: [1000, 1500, 1200, 1100, 1400, 1600]
Total sales for the week: 7800


# Strings 
* Strings are fundamental when working with text data, which is common in data analytics tasks like parsing CSV files, working with dates, or cleaning datasets.

In [2]:
# Example: Cleaning a column name in a dataset
column_name = " customer_age "
cleaned_column_name = column_name.strip().replace(" ", "_")
print("Cleaned column name:", cleaned_column_name)

# Splitting a string of customer feedback into words
feedback = "This product was amazing and delivery was fast"
words = feedback.split()
print("Feedback as a list of words:", words)


Cleaned column name: customer_age
Feedback as a list of words: ['This', 'product', 'was', 'amazing', 'and', 'delivery', 'was', 'fast']


# Dictionaries 
* Dictionaries are ideal for storing key-value pairs, which are very useful in data engineering tasks such as mapping feature names to values, or creating a lookup for faster data processing.

In [3]:
# Example: Mapping feature names to their types in a dataset
feature_types = {
    "age": "int",
    "salary": "float",
    "city": "string"
}
print("Feature type for age:", feature_types["age"])

# Modifying the feature type for salary
feature_types["salary"] = "double"
print("Updated feature types:", feature_types)


Feature type for age: int
Updated feature types: {'age': 'int', 'salary': 'double', 'city': 'string'}


# Arrays
* Arrays are especially useful when working with numerical data and performing vectorized operations. NumPy arrays are faster and more efficient for large datasets, making them essential in data science for tasks like matrix operations, linear algebra, or statistics.

In [4]:
import numpy as np

# Example: Creating an array of monthly sales data
monthly_sales = np.array([1000, 1500, 1200, 1100, 1400, 1600])
print("Monthly sales data:", monthly_sales)

# Calculating the mean sales for the first half of the year
mean_sales = np.mean(monthly_sales)
print("Mean monthly sales:", mean_sales)

# Simulating data for analysis
simulated_data = np.random.normal(loc=50, scale=10, size=1000)
print("First 5 simulated data points:", simulated_data[:5])


Monthly sales data: [1000 1500 1200 1100 1400 1600]
Mean monthly sales: 1300.0
First 5 simulated data points: [58.35840116 46.30164301 67.02186447 48.86464    30.70403624]
