# Fundamental of Data Analysis Tasks

**Author: Cecilia Pastore**

***


## Table of Contents
* [Chapter 1](#chapter1)
    * [Section 1.1](#section_1_1)
        * [Sub Section 1.1.1](#sub_section_1_1_1)
* [Chapter 2](#chapter2)
    * [Section 2.1](#section_2_1)
        * [Sub Section 2.1.1](#sub_section_2_1_1)
        * [Sub Section 2.1.2](#sub_section_2_1_2)
    * [Section 2.2](#section_2_2)
        * [Sub Section 2.2.1](#sub_section_2_2_1)
        * [Sub Section 2.2.2](#sub_section_2_2_2)
* [Chapter 3](#chapter3)
    * [Section 3.1](#section_3_1)
        * [Sub Section 3.1.1](#sub_section_3_1_1)
        * [Sub Section 3.1.2](#sub_section_3_1_2)
    * [Section 3.2](#section_3_2)
        * [Sub Section 3.2.1](#sub_section_3_2_1)
        * [Sub Section 3.2.2](#sub_section_3_2_2)

### Task 1 - The Collatz conjecture

<details>
    <summary>Task requested:</summary>
           <p>

The Collatz conjecture1 is a famous unsolved problem in mathematics. The problem is to prove that if you start with any positive integer x and repeatedly apply the function f(x) below, you always get stuck in the repeating sequence 1, 4, 2, 1, 4, 2, . . .
$$
 f(n) =
  \begin{cases}
    n/2       & \quad \text{if } n \text{ is even}\\
    -(n+1)/2  & \quad \text{if } n \text{ is odd}
  \end{cases}
$$


For example, starting with the value 10, which is an even number,we divide it by 2 to get 5. Then 5 is an odd number so, we multiply by 3 and add 1 to get 16. Then we repeatedly divide by 2 to get 8, 4, 2, 1. Once we are at 1, we go back to 4 and get stuck in the repeating sequence 4, 2, 1 as we suspected.

Your task is to verify, using Python, that the conjecture is true for the first 10,000 positive integers.

</p>
</details>

I want to devide the task in 2 parts:
1. in the first part I will define a function f(x) and a function collatz(x) to check the collatz sequence for a number imput by the user 

2. in the second part, I'll use a for loop to check the Collatz sequence for each number from 1 to 10,000 to verify that the sequence ends up at 1. 

**1.1 Verify the collatz conjecture for a given number**

<details>
    <summary>How it works:</summary>
           <p>
           
1.  The script first prompts the user to enter a positive number. A while loop is used to validate whether the user inputs a positive number. If a positive number is provided, the script exits the while loop. However, if the user enters a negative number, the script will continue to prompt the user to enter a positive number until one is provided.

2. The first function, f(x), is defined. This function checks whether the number is even or odd and, based on that, performs a specific calculation.
    If the number is even, the script divides it by 2 and appends it to the "numbers" list.
    If the number is odd, the script multiplies the number by 3 and adds 1.
   
3. Finally, the collatz(x) function is responsible for computing and presenting the Collatz sequence for the given number.

<details>
<p>

*Code explanation:*

Prompt the user to enter a positive number [[1]](https://www.geeksforgeeks.org/how-to-take-integer-input-in-python/) which is then converted in an integer and stored in the number variable.

In [12]:
number = int(input("Please Enter a positive number: "))

A first while loop [[2]](https://www.programiz.com/python-programming/while-loop) is used to ensure that the user enters a positive number. If the number entered is less than or equal to 0, the loop will execute and prompt the user to enter a positive number again.

In [13]:
while number <=0:
    print("You didn't enter a positive number!")
    number = int(input("Please Enter a positive number: "))

Define a  function **f(x)** that takes an input integer x and checks whether x is even or odd [[3]](https://www.toppr.com/guides/python-guide/examples/python-examples/python-program-to-check-if-a-number-is-odd-or-even/) using the modulo operator %. If x is even, it divides x by 2 using integer division (//) and returns the result. If x is odd, it performs the operation 3 * x + 1 and returns the result.

In [14]:
def f(x):
    # if f is even, devide by two.
    if x % 2 == 0:
        return x // 2 
    else: 
        # If x is odd, perform the operation (3 * x) + 1 and return the result
        return (3 *x) +1
   

Define the **collatz(x)** function. The collatz function use the previous function to print the collatz sequence strating from x and continuing until the values reaches 1. During the loop, it prints each value of x in the sequence separated by hyphens. Finally, it prints the value 1 to indicate the end of the Collatz sequence.

In [15]:
def collatz(x):
    # Print a message indicating that we are testing the Collatz sequence with the initial value x
    print(f'Testing Collatz with initial value {x}')
    # while loop until x == 1
    while x != 1:
        # print the current value of x and update it using the collatz function 
        print(x, end= "-")
        x = f(x)
    # Print the final value (1)
    print(x)

Finally run the function **collatz(x)** based on the number entered by the user

In [16]:
collatz(number)

Testing Collatz with initial value 8
8-4-2-1


**1.2 Verify that the collatz's conjecture is true for the first 10,000 positive integers.**

<details>
    <summary>How it works:</summary>
           <p>
1. The script use the same function defined in task 1.1, **f(X)** to perform a specific calculation if the number is odd or even.

2. A list called **collatz_list** is initializes to store the result of Collatz sequences for a range of positive integers.

3. The function **collatz_sec(x)** is defined. This function applies the Collatz sequence rules to a given positive integer x, iteratively updating it until it reaches the value 1, and then returns that final value.

4. A loop iterates through positive integers from 1 to 10,000, to calculate the Collatz sequence for each integer and stores the final value in the **collatz_list**.

5. The highest value within the **collatz_list** is identifies. If this max value is 1, the collatz's conjecture is verified and the result it is printed

<details>
<p>

*Code Explanation:*

The function **collatz_sec(x)** calculates the Collatz sequence for a given positive integer x by iteratively applying the function f(x) until x reaches the final value of 1. It then returns this final value of 1 as per the Collatz sequence rule.

In [17]:
def collatz_sec(x):
    # Keep running the Collatz sequence until x reaches 1
    while x != 1:
        x = f(x)  # Apply the function f(x) to update x
    return x  # Return the final value of x, which is 1 for the Collatz sequence


The list **collatz_list** is defined as empty.

In [18]:
collatz_list = []  # Initialize an empty list to store results

A for loop is used to calculate the Collatz sequence for integers from 1 to 10,000 and stores the results in collatz_list [[4]](https://www.freecodecamp.org/news/python-map-function-how-to-map-a-list-in-python-3-0-with-example-code/). 

In [19]:
# Loop through positive integers from 1 to 10,000
for i in range(1, 10001):
    collatz_list.append(collatz_sec(i))  # Calculate the Collatz sequence for i and store the result

A max() [[5]](https://www.w3schools.com/python/ref_func_max.asp) function is used to return the item with the highest value in the **collatz_list** and this is stored in the variable **max_number**.

In [20]:
max_number = max(collatz_list)  # Find the maximum value in the collatz_list

Finally, an if statement [[6]](https://www.w3schools.com/python/python_conditions.asp) checks whether the maximum value is equal to 1, which is a key condition to verify Collatz's conjecture. If it is 1, it confirms the conjecture's truth for the specified range of positive integers and prints a corresponding message. Otherwise, it prints a message indicating that the conjecture is not true.

In [21]:

if max_number == 1:
    print("The collatz's conjecture is true for the first 10,000 positive integers.")
else:
    print("The collatz's conjecture is not true for the first 10,000 positive integers.")

The collatz's conjecture is true for the first 10,000 positive integers.


**Sources**

<details>
       <summary>Rferences:</summary>
              <p>

- [1] [How to take integer input in Python?](https://www.geeksforgeeks.org/how-to-take-integer-input-in-python/)
- [2] [Python while Loop](https://www.programiz.com/python-programming/while-loop)
- [3] [Python Program to Check if a Number is Odd or Even](https://www.toppr.com/guides/python-guide/examples/python-examples/python-program-to-check-if-a-number-is-odd-or-even/)
- [4] [Python Map – How to Map a List in Python 3.0, With Example Function Code](https://www.freecodecamp.org/news/python-map-function-how-to-map-a-list-in-python-3-0-with-example-code/)
- [5] [Python max() Function](https://www.w3schools.com/python/ref_func_max.asp)
- [6] [Python If ... Else](https://www.w3schools.com/python/python_conditions.asp)

<details>
    <summary>Other resourcers consulted:</summary>
           <p>

- [Creating Links in Markdown](https://anvilproject.org/guides/content/creating-links)
- [Python Tutorial: How to take an integer input in Python](https://pieriantraining.com/python-tutorial-how-to-take-an-integer-input-in-python/)
- [Python For Loops](https://www.w3schools.com/python/python_for_loops.asp)
- [Python List max() Method](https://www.tutorialspoint.com/python/list_max.htm)
- [How To add Elements to a List in Python](https://www.digitalocean.com/community/tutorials/python-add-to-list)
- [LaTeX/Mathematics](ttps://en.wikibooks.org/wiki/LaTeX/Mathematics)


### Task 2 - The Penguins Data Set

<div>
  <center><img src="https://i0.wp.com/begincodingnow.com/wp-content/uploads/2023/03/palmerpenguins.png?ssl=1" width="300"></center>
</div>

<div>
  <center><a href="https://allisonhorst.github.io/palmerpenguins/articles/art.html"><i>[Fig. 1] - Palmer penguins hex</i></a></center>
</div>



The Palmer Penguin dataset, also known as the Palmer Archipelago penguin dataset, is a popular and publicly available dataset that contains a collection of morphological measurement and observational data for three species of penguins found on the Palmer Archipelago near the Antarctic Peninsula[[1]](https://www.chegg.com/homework-help/questions-and-answers/palmerpenguins-dataset-collection-morphological-measurements-observational-data-three-spec-q114471634). This dataset is often used, as a great alternative to the Iris dataset, for educational and research purposes in data science and machine learning [[2]](https://allisonhorst.github.io/palmerpenguins/articles/intro.html).

<div>
  <center><img src="https://allisonhorst.github.io/palmerpenguins/reference/figures/lter_penguins.png" width="500"></center>
</div>

<div>
  <center><a href="https://allisonhorst.github.io/palmerpenguins/articles/art.html"><i>[Fig. 2] - Palmer Species</i></a></center>
</div>

Data for this dataset were compiled during the years 2007 to 2009 by Dr. Kristen Gorman, a member of the Long Term Ecological Research Network, in collaboration with the Palmer Station, a research facility situated on Anvers Island in the Antarctic Peninsula.  

The objective was to investigate the entirety of available data concerning ecological sexual dimorphism and environmental variability within a community of Antarctic penguins belonging to the *Pygoscelis genus*, as documented in the work of Gorman, Williams, and Fraser in 2014[[3]](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0090081).

<div>
  <center><img src="https://cdn2.vectorstock.com/i/1000x1000/00/26/diagram-showing-body-part-penguin-vector-25010026.jpg" width="500"></center>
</div>

<div>
  <center><a href="https://www.vectorstock.com/royalty-free-vector/diagram-showing-body-part-penguin-vector-25010026"><i>[Fig. 3] - Body Part of a penguin</i></a></center>
</div>

The dataset consist of 344 penguins observationthe following information for each penguin specimen[[4]](https://journal.r-project.org/articles/RJ-2022-020/):

1. Species: The species of penguin, which can be one of three types: Adelie, Gentoo, or Chinstrap.
2. Island: The specific island in the Palmer Archipelago where the penguin was observed. The dataset typically includes data from three islands: Biscoe, Dream, and Torgersen.
3. Culmen Length and Depth: Measurements of the culmen, which is the upper ridge of the penguin's beak, including its length and depth in millimeters.
4. Flipper Length: The length of the penguin's flipper, also measured in millimeters.
5. Body Mass: The mass of the penguin in grams.
6. Sex: The gender of the penguin, which can be male or female.


<div>
  <center><img src="https://allisonhorst.github.io/palmerpenguins/reference/figures/culmen_depth.png" width="500"></center>
</div>

<div>
  <center><a href="https://www.vectorstock.com/royalty-free-vector/diagram-showing-body-part-penguin-vector-25010026"><i>[Fig. 4] - Penguins Bills</i></a></center>
</div>

***
### End