## Introduction to python for data analysis 2

# Python for Data Science: Collections and Data Structures

## Introduction

In Python, there are several built-in data structures used to store collections of data. These include lists, sets, tuples, and dictionaries. Additionally, in data science, we often use arrays and data frames provided by libraries such as NumPy and pandas. This section will cover their characteristics, uses, and some basic operations.


## 1. List (list)

### Characteristics
- Ordered collection of items
- Mutable (can be changed after creation)
- Allows duplicate elements
- Elements can be of different data types

### Uses
- Storing a sequence of items
- Common operations: indexing, slicing, appending, and iterating

In [2]:
# Example of list
fruits = ["apple", "banana", "cherry"]
fruits.append("orange")  # Adding an item
print(fruits)            # Output: ['apple', 'banana', 'cherry', 'orange']
print(fruits[1])         # Output: banana

['apple', 'banana', 'cherry', 'orange']
banana


## 2. Set (set)
### Characteristics
- Unordered collection of unique items
- Mutable
- Does not allow duplicate elements

### Uses
- Storing unique items
- Common operations: union, intersection, difference

In [36]:
# Example of set
numbers = {1, 2, 3, 4, 5}
numbers.add(7)          # Adding an existing item (no effect)
print(numbers)          # Output: {1, 2, 3, 4, 5}


{1, 2, 3, 4, 5, 7}


In [34]:
numbers.add(6)
numbers.

{1, 2, 3, 4, 5, 6}

## 3. Tuple (tuple)
### Characteristics
- Ordered collection of items
- Immutable (cannot be changed after creation)
- Allows duplicate elements
- Elements can be of different data types

### Uses
- Storing a fixed sequence of items
- Common operations: indexing, slicing

In [5]:
# Example of tuple
coordinates = (10.0, 20.0)
print(coordinates[0])   # Output: 10.0


10.0


## 4. Dictionary (dict)
### Characteristics
- Unordered collection of key-value pairs
- Keys must be unique and immutable
- Values can be of any data type
- Mutable

### Uses
- Storing data pairs (e.g., mapping names to values)
- Common operations: accessing, adding, and removing key-value pairs

In [39]:
list = ["name", "nurudeen"]
set = {"name", "nurudeen"}
dict = {"name": "Nurudeen"}

In [42]:
list[1]

'nurudeen'

In [13]:
# Example of dictionary
student = {"name": "Alice", "age": 25, "is_student": True}
student["age"] = 26      # Updating a value
print(student["name"])   # Output: Alice


Alice


## 5. Array (array from NumPy)
### Characteristics
- Ordered collection of items
- Elements are of the same data type
- Mutable
- More efficient than lists for numerical operations

### Uses
- Numerical computations
- Common operations: element-wise operations, slicing, mathematical functions

In [62]:
import numpy as np

# Example of array
arr = np.array([[1, 2, 3], [4, 5, 6]])
print(arr + 2)           # Output: [3 4 5 6 7]
# print(arr * 3)           # Output: [ 3  6  9 12 15]


[[3 4 5]
 [6 7 8]]


In [56]:
arr

array([[1, 2, 3],
       [4, 5, 6]])

In [59]:
arr[1][2]


6

## 6. Data Frame (DataFrame from pandas)
### Characteristics
- 2-dimensional, size-mutable, and heterogeneous tabular data
- Labeled axes (rows and columns)
- Mutable
- Allows missing data

# Uses
- Data manipulation and analysis
- Common operations: indexing, filtering, aggregating, merging

In [66]:
import pandas as pd

# Example of DataFrame
data = {
    "Name": ["Alice", "Bob", "Charlie", "Olaide"],
    "Age": [25, 30, 35, 60],
    "Occupation": ["Engineer", "Doctor", "Artist", "Pharmacist"]
}

df = pd.DataFrame(data)
df
# Output:
#       Name  Age Occupation
# 0    Alice   25   Engineer
# 1      Bob   30     Doctor
# 2  Charlie   35     Artist


Unnamed: 0,Name,Age,Occupation
0,Alice,25,Engineer
1,Bob,30,Doctor
2,Charlie,35,Artist
3,Olaide,60,Pharmacist


## Summary
Understanding these data structures and their uses is essential for effective data manipulation and analysis in Python. Each data structure has its own strengths and is suited for different types of tasks:

- *Lists* are versatile and easy to use for ordered collections.
- *Sets* are useful for storing unique items and performing set operations.
- *Tuples* are great for storing fixed sequences of items.
- *Dictionaries* are excellent for key-value mappings.
- *Arrays* (from NumPy) are efficient for numerical computations.
- *Data Frames* (from pandas) are powerful for handling and analyzing