# Lux Tech Academy & Data Science East Africa Bootcamp

### Data Analysis and Analytics Project Based Learning Bootcamp

#### This Notebook contains week 2 project assignment for the Bootcamp training from September 23<sup>rd</sup> to October 11<sup>th</sup>, 2024

#### Project assignment

1. What is garbage collection in the context of Python, and why is it important? Can you explain how memory management is handled in Python?
2. What are the key differences between NumPy arrays and Python lists, and can you explain the advantages of using NumPy arrays in numerical computations?
3. How does list comprehension work in Python, and can you provide an example of using it to generate a list of squared values or filter a list based on a condition?
4. Can you explain the concepts of shallow and deep copying in Python, including when each is appropriate, and how deep copying is implemented?
5. Explain with examples the difference between list and tuples. 

### Garbage Collection
- Lack of effective memory management leads to slow performance of applications, unexpected crashes, and even memory leaks.
- **Garbage Collection** is a memory management technique used in programming languages to automatically free up memory that is no longer accessible or in use by the application. It helps prevent memory leaks, optimize memory usage, and ensure efficient memory allocation for the program.

#### Why is garbage collection necessary?
- When writing code, we create objects that store data, perform calculations, or even manage tasks. However, the objects continue to occupy memory space when they have served their purpose, unless they are explicitly removed.
- When these objects that are not used accumulate in your code, they cause your application to use more memory than necessary, slowing down the performance of your application.
- Garbage collection automatically detects when objects are no longer needed and  safely removes them from memory. This helps:-
    - **Prevent memory leaks** - Automatically removes unused objects from memory reducing the risk of leaks (unreleased memory no longer needed)
    - **Simplify the process of writing code** - Python automatically handles memory management and hence you can focus on writing code rather than worring about memory management.
    - **Optimize performance** - Freeing up memory helps maintain the performance of your application especially in programs that handle large amounts of data.

#### How Python handles memory management
- At the very primary level, Python uses two different memory management techniques to implement garbage collection, they are: *reference counting* and *generational garbage collection*.
- The two methods help to ensure that memory is managed properly and minimise the cahnces of memory leaks and optimizes your application performance.
#### 1. **Reference counting**
This is a fundamental method used to manage memory in Python. It involves keeping track of the number of references to an object in memory. Every time a new reference to an object is created, Python will increase the reference count of that object. Similarly, when a reference is removed, Python will decrease the reference count.
- Tracking References: Every object in Python has a reference count, which is updated whenever the object is referenced or dereferenced. For instance, assigning an object to a variable or passing it to a function increases its reference count, while deleting the variable decreases the count.
- Deallocating Memory: When an object’s reference count drops to zero, meaning no part of your code is using the object, Python automatically deallocates the memory it occupies.

However, this method of reference counting has limitations. It doesn't have the ability to handle cyclic references, that occur when two or more objects reference one another forming a cycle. In such cases, references counts never reach zero thus preventing memory from being reclaimed.

#### 2. **Generational garbage collection**
This method overcomes the limitations of reference counting. It is designed to handle cyclic references, that occur when two or more objects reference one another, improving the memory management efficiency. The idea is that most objects are short-lived(i.e. temporary objects), or long-lived(i.e. persistent objects) and this categorisation of objects by age helps optimize Python's garbage collection performance.
- *Generations* Python’s garbage collector organizes objects into three generations: Generation 0 (youngest), Generation 1 (middle-aged), and Generation 2 (oldest). New objects are placed in Generation 0, and if they survive garbage collection, they move to the next generation.
- *Prioritizing Younger Objects* - The garbage collector runs more frequently on younger objects (Generation 0) because these objects are more likely to become unused quickly. As objects age and are moved to higher generations, they are collected less frequently. This approach reduces the overhead of garbage collection by focusing more on objects that are likely to be discarded soon.
- *Handling Cyclic References* - Generational garbage collection is particularly effective at identifying and collecting objects involved in cyclic references. During the collection process, Python’s garbage collector can detect these cycles and reclaim the memory, preventing memory leaks caused by cyclic references.

### Key Differences Between NumPy arrays and Python lists

**Numpy** is a Python library that provides a multidimensional array object, various derived objects (such as masked arrays and matrices), and an assortment of routines for fast operations on arrays, including mathematical, logical, shape manipulation, sorting, selecting, I/O, discrete Fourier transforms, basic linear algebra, basic statistical operations, random simulation and much more.

- *Data Type Homogeneity* - NumPy arrays can only store elements of the same data type (e.g., all integers or all floats), while Python lists can hold elements of different types (e.g., integers, strings, etc.).
- *Performance* - NumPy arrays are more efficient and faster for numerical computations due to contiguous memory allocation and the use of vectorized operations.
- *Size* - NumPy arrays consume less memory compared to Python lists.
- *Functionality* - NumPy provides a wide range of functions optimized for mathematical operations, such as matrix manipulations, linear algebra, and random number generation.

Advantages of NumPy arrays

- *Speed* - NumPy arrays allow for fast array operations, often implemented in compiled C code, which makes them much faster than native Python lists.
- *Broadcasting* - NumPy allows element-wise operations without explicit looping, which makes it ideal for handling large datasets in data science.

### List Comprehension in Python - Smart List

- There are istances where we want to create lists at runtime or create a list from another list or to create a list depending on user data
- Let's for example assume we want to create a list of even numbers at runtime, this is how to do it
    ```
    even=[]
    for x in range(1,11):
        if x%2==0:
            even.append(x)
    print(even)
    ```

- List comprehension is an elegant way to define and create a list in Python
- We can create lists just like mathematical statements and in only a single line
- The syntax for list comprehension is easier to grasp, it consists of the following parts:
    1. output expression
    2. input sequence
    3. a variable representing a member of the input sequence
    4. an optional predicate part
- using list comprehension to create a list of even numbers in a given range,
    ```
    even=[x for x in range(1,11) if x % 2 ==0]
    print(even)
    ```

- Here,
    * > `x` is the output expression
    * > `range(1,11)` is the input sequence
    * > `x` is variable 
    * > if `x % 2 == 0` is predicate part/condition

- ***Note***: Not all parts of a list comprehension are mandatoryto be used, look 👇👇
    ```
    string="Hello, list"
    characters_list=[x for x in string]
    print(characters_list)
    ```
- ***Note***: There is no condition mentioned in the list comprehension part

In [1]:
#? generate a list of squared values
squared_values = [x**2 for x in range(10)]
print(squared_values)

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]


In [2]:
#? filter a list based on a condition
numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9]
even_numbers = [x for x in numbers if x % 2 == 0]
print(even_numbers)

[2, 4, 6, 8]


### Shallow and Deep Copying in Python
- Assignment statements in Python do not copy objects, they create a binding between a target and an object i.e. the `=` operator creates a new variable that shares the reference of the original object. In order to do real copies of these objects, we need to use the copy module in *Python*

+ *Shallow copy* - A shallow copy creates a new object but does not create copies of the objects that are nested inside the original object. Instead, it copies references to those nested objects, meaning any changes to the nested objects will affect both the original and the shallow copy.
+ A shallow copy constructs a new compound object and then inserts references into it to the objects found in the original.

```
import copy
original_list = [[1, 2], [3, 4]]
shallow_copy = copy.copy(original_list)
shallow_copy[0][0] = 100
print(original_list)  # Output: [[100, 2], [3, 4]]
```

+ *Deep copy* - A deep copy creates a new object and recursively copies all objects nested within the original, creating entirely independent objects. Changes to the nested objects in the copy do not affect the original.
+ First, a new collection object is constructed and then it is recursively populated with copies of the child objects contained in the original.
+ A deep copy results in a completely independent copy of the original, including all nested objects i.e. changes made to a copy of the object do not reflect in the original object.

```
import copy
original_list = [[1, 2], [3, 4]]
deep_copy = copy.deepcopy(original_list)
deep_copy[0][0] = 100
print(original_list) 
```

+ A modification to an element in *deep_copy* does not affect *original_list*.

**When to use:-**

Shallow copy is appropriate when you need to create a copy of an object that doesn’t have nested mutable objects (e.g., a flat list of integers or strings).
Deep copy is used when you want to avoid any reference to the original nested objects.

*The main difference between deep copy and shallow copy in Python is how they handle the objects they copy:*


- *Shallow Copy* - Creates a new object, but inserts references into it to the objects found in the original. It copies the top-level structure of the object, but not the nested objects within.
- *Deep Copy* - Creates a new object and recursively copies all objects found in the original. This means it duplicates not just the top-level structure but also all nested structures.

In [3]:
import copy
li1 = [1, 2, [3, 5], 4]
li2 = copy.copy(li1)
print("li2 ID: ", id(li2), "Value: ", li2)
li3 = copy.deepcopy(li1)
print("li3 ID: ", id(li3), "Value: ", li3)

li2 ID:  2843915727232 Value:  [1, 2, [3, 5], 4]
li3 ID:  2843914389632 Value:  [1, 2, [3, 5], 4]


#### The code above prints the IDs and values of li2 and li3, highlighting the distintion between shallow and deep copies in terms of reference and independence.

### Lists And Tuples in Python

#### Lists
- Lists in Python is a type of variable which allows us to store multiple values of various data types in a single variable
- It is a sequence type data structure just like a string
- We can access the values individually using the concept of index numbers
- Important features of a List are:-
    * Lists are ordered collections of multiple values
    * Lists are mutable, ie, values of lists can be changed and manipulated over a period of time
    * Lists allows us to store duplicate values
    * Lists are created by placing the sequence inside the square brackets[...]
- The elements inside the sequence are comma separated
- It can have any number of items and they may be of different types (integer, float, string, etc)
    ```
    list1=[]
    list2=[1,4,5,7,3,9]
    list3=["Edwin",23,34.9,True]
    ```

### Manipulating a list
1. Accessing elements in a list
2. List slicing
3. Updating a list
4. Deleting elements from a list
- Since the list is a sequence data type, we can access elements of a list using indices
- Just like strings, the elements of a list are assigned with an index starting from zero

### Operations on lists
- concatenation
- repitition
- membership

- Adding two lists is list concatenation and multiplying a list is repitition
- We use `'+'` and `'*'` operators respectively
    ```
    lst1=[1,2,3]
    lst2=[4,5,6]
    print(lst1+lst2)
    print(lst1*2)
    ```

- Membership operator `(in)` is used to test the membership of an element in a list

### built-in functions for lists
* len(list)- returns the length of the list
* min(list)- returns the elements of the list with the minimum value
* max(list)- returns the elements of the list with the maximum value

### lists methods
- methods can be accessed using the syntax `list.method()`, they include,
    1. append(element): adds the specified element to the end of the list
    2. insert(element): inserts the element at the specified position or index
    3. pop(): removes and returns the last element from the list
    4. remove(element): removes the specified element from the list
    5. reverse(): reverse the order of elements in the list
    6. index(element): returns the index of the first matched item
    7. count(element): returns count of how many times an element occur in the list

#### Tuples
+ Tuples are similar to the lists. Lists allow us to store multiple values in a single variable
+ The only major difference is that values of lists can be changed, ie, Lists are mutable, whereas Tuples are immutable, ie, values of Tuples cannot be changed
+ Tuples are used in situations where we may wish to restrict the manipulation of sequences of values
+ We can only fetch and read the values
+ Tuples is a data type/structure in Python, they come under the category of sequence data types
+ Here are some important features of tuples
    * Tuples are ordered collections of multiple values
    * Tuples are immutable
    * Tuples allow us to store duplicate values

#### creating a tuple
+ Tuples are created by just placing the sequence inside the `paranthesis ()`
+ The elements inside the parentheses are comma separated
+ it can have any number of items and they may be of different types (integer, float, string, etc...)
+ Example
    ```
    tuple1=()
    tuple2=(1,4,6,8,9)
    tuple3=("Gichuki", 23.5, 56, False)
    ```

#### manipulating tuples
+ the following are ways to manipulate tuples
    1. Accessing elements of a tuple
    2. Tuple slicing
    3. Updating a tuple (Spoiler alert: this is not possible)
    4. Deleting elements of a tuple

+ Accessing elements and tuple slicing - tuples are a sequence data type, thus each element is associated with an index that starts from zero.
+ Just like lists, we can use indexes to access and slice elements of a tuple.
+ Updating tuples - tuples are immutable, thus we cannot update tuples.
+ the code below will raise an error since tuples are immutable:
    ```
    tuple1 = (1, 2, 3, 5, 6, 7)
    tuple1[3] = 9
    print(tuple1)
    ```
**Output**
    ```
    TypeError: 'tuple' object does not support item assignment
    ```

+ Since tuples are immutable, deleting/removing individual tuple elements isn't possible
+ However, you can delete the entire tuple using the `del` keyword

#### operations on tuples
+ All the basinc operations performed on a list can be performed on tuples as well
+ However, the result of the operation is always a new tuple since we cannot manipulate the existing tuple
+ operations,
    1. concatenation
    2. repitition
    3. membership

#### built-in functions for tuples
* len(tuple) - returns the length of the tuples
* min(tuple) - returns the elements of the tuples with the minimum value
* max(tuple) - returns the elements of the tuples with the maximum value
+ since tuples are immutable, we cannot use methods such as, `append(), insert(), remove(), pop(), etc` which manipulate and modify the sequence

+ There are 2 methods that we can use on tuples
    1. `index(element)`: returns the index of the first matched item
    2. `count(element)`: returns the number of times an element occurs in the sequence


#### advantages of tuples
1. we generally use tuples for heterogeneous (different) data types and lists for homogeneous (similar) data types
2. since tuples are immutable, iterating over them is faster than iterating over a list, so there is a slight performance boost
3. tuples that contain immutable elements can be used as a key for a dictionary, with lists, that is not possible
4. if you have data that does not change, implementing it as a tuple will guarantee that it remains write-protected
