#### Description of the files in the repository

* <span style="color:purple">__.gitignore__</span> 

> File format: Text. Description: Specifies files and directories to be ignored by Git during commits and updates.

* <span style="color:purple">__analysis.py__</span>

> File format: Python script (.py). Description: Python script for analyzing the Iris dataset. Generates various analytical results based on the data.

* <span style="color:purple">__Corelation_matrix.png__</span>

> File format: PNG image. Description: Correlation matrix demonstrating the relationship between variables in the Iris dataset. Generated using the analysis script.

* <span style="color:purple">__Individual_ratios.txt__</span>

> File format: Text. Description: File containing individual length-to-width ratios for each iris specimen. Generated using the analysis script.

* <span style="color:purple">__iris-pic.png__</span>

> File format: PNG image. Description: Image of an iris, used for visualization in README.

* <span style="color:purple">__iris.csv__</span>

> File format: CSV. Description: Iris dataset containing information about the length and width of sepals and petals of iris flowers.

* <span style="color:purple">__Mean_ratios.txt__</span>

> File format: Text. Description: File containing mean length-to-width ratios for each iris species. Generated using the analysis script.

* <span style="color:purple">__petal length_cm_histogram.png, petal width_cm_histogram.png, sepal length_cm_histogram.png, sepal width_cm_histogram.png, species of flowers_histogram.png__</span>

> File format: PNG image. Description: Histograms for each variable in the Iris dataset. Generated using the analysis script.

* <span style="color:purple">__petal length_cm(Summary).txt, petal width_cm(Summary).txt, sepal length_cm(Summary).txt, sepal width_cm(Summary).txt, species of flowers(Summary).txt__</span>

> File format: Text. Description: Files with a brief description of each variable in the Iris dataset. Generated using the analysis script.

* <span style="color:purple">__ratio.py__</span>

>  File format: Python script (.py). Description: Python module containing functions for calculating individual and mean length-to-width ratios.

* <span style="color:purple">__README.md__</span>

> File format: Markdown (.md). Description: README file with project description, installation and usage instructions, description of files and analytical conclusions based on the results obtained. 

* <span style="color:purple">__scatter_plot.png__</span>

> File format: PNG image. Description: Scatter plot showing relationships between pairs of variables in the Iris dataset. Generated using the analysis script.

* <span style="color:purple">__Summary.txt__</span>

> File format: Text. Description: File containing summary information about the Iris dataset. Generated using the analysis script.



#### Description of the Iris flower data analysis process

First of all, let's start by specifying the essential libraries and modules utilized to gather information about the Iris flower dataset in the main analysis.py script.

```python
import pandas as pd 
import numpy as np 
import matplotlib.pyplot as plt 
import seaborn as sns 
import ratio
```

The iris.csv file was assigned as a variable. 

```python
filename="iris.csv"
```

1. **Data Summary**. 

A function that reads data from a file and returns it is used in all subsequent functions. 

```python
def read_with_pandas():
   try:
      data=pd.read_csv(filename)
      return data 
   except FileNotFoundError:
      print(f"File{filename} is not found")
```

Three functions were created to obtain general characteristics for the variables: 

```python
def summary_info():  
```
This function calculates and returns summary statistics for variables in the Iris dataset.
It gives summary characteristics such as count, mean, standard deviation, minimum, 25th percentile, median, 75th percentile, and maximum for each variable.
Summary statistics can be displayed on the console by adding print (summaryinfo) at the end of the function.

```python
def summary_info_results():
```

This function calls summary_info() to get summary statistics for the Iris dataset.
It then writes the summary statistics to a text file named "Summary.txt" for ease of use and sharing.
The summary information includes general characteristics of the variables, checking for missing values, counts of the number of flowers of each species, and characteristics of variables by each species.

```python
def separate_summary_info():
```

This function extracts and prints separate summary statistics for each variable in the Iris dataset.
It creates separate text files for each variable containing its descriptive statistics.
The summary statistics for each variable include the count, mean, standard deviation, minimum, 25th percentile, median, 75th percentile, and maximum value.


**Quick observation**: There are 150 measurements for each variable, 
the mean values of sepal length, sepal width, petal length and petal width are approximately 5.84 cm, 3.05 cm, 3.76 cm and 1.20 cm respectively, the standard deviation measures the variance or spread of values around the mean. The minimum is the smallest observed value for each variable. For example, the minimum sepal length is 4.3 cm, the minimum sepal width is 2.0 cm, and so on.
25th percentile (Q1): the first quartile boundary below which 25% of the observations are below. 50th percentile (median) (Q2): divides the data set into two equal halves. 75th percentile (Q3): the third quartile is the boundary below which 75% of the observations are below. 
Maximum is the largest observed values for each variable. For example, the maximum sepal length is 7.9 cm, the maximum sepal width is 4.4 cm, and so on.

2. **Histogram Generation**. 
3. **Scatter Plot Creation**. 
4. **Additional Analysis**. 

To generate histograms for each variable this function was created: 

```python
def safe_histogram():
```
The outputs are: 

<div style="display:flex; flex-direction:row;">
    <img src="petal length_cm_histogram.png" alt="Petal Length Histogram" style="width:50%;">
    <img src="petal width_cm_histogram.png" alt="Petal Width Histogram" style="width:50%;">
</div>

<div style="display:flex; flex-direction:row;">
    <img src="sepal length_cm_histogram.png" alt="Sepal Length Histogram" style="width:50%;">
    <img src="sepal width_cm_histogram.png" alt="Sepal Width Histogram" style="width:50%;">
</div>

These histograms provide visual insights into the distribution of each variable and can be useful for exploratory data analysis.

| Variable 1    | Variable 2    | Correlation Coefficient | Relationship Description              |
|---------------|---------------|------------------------|----------------------------------------|
| Petal Length  | Petal Width   | 0.96                   | Strong positive correlation            |
| Petal Length  | Sepal Length  | 0.87                   | Strong positive correlation            |
| Sepal Length  | Petal Width   | 0.82                   | Strong positive correlation            |
| Sepal Length  | Sepal Width   | -0.11                  | No correlation                         |
| Sepal Width   | Petal Width   | -0.36                  | Weak negative correlation              |
| Sepal Width   | Petal Length  | -0.42                  | Weak negative correlation              |

The output of mean ratios is: 
``` 
Mean Ratios:
Species: Iris-setosa
Mean Petal Length to Width Ratio: 7.08
Mean Sepal Length to Width Ratio: 1.47

Species: Iris-versicolor
Mean Petal Length to Width Ratio: 3.24
Mean Sepal Length to Width Ratio: 2.16

Species: Iris-virginica
Mean Petal Length to Width Ratio: 2.78
Mean Sepal Length to Width Ratio: 2.23
``` 

|              | sepal length_cm | sepal width_cm | petal length_cm | petal width_cm |
|--------------|-----------------|----------------|-----------------|----------------|
| count        | 150.000000      | 150.000000     | 150.000000      | 150.000000     |
| mean         | 5.843333        | 3.054000       | 3.758667        | 1.198667       |
| std          | 0.828066        | 0.433594       | 1.764420        | 0.763161       |
| min          | 4.300000        | 2.000000       | 1.000000        | 0.100000       |
| 25%          | 5.100000        | 2.800000       | 1.600000        | 0.300000       |
| 50%          | 5.800000        | 3.000000       | 4.350000        | 1.300000       |
| 75%          | 6.400000        | 3.300000       | 5.100000        | 1.800000       |
| max          | 7.900000        | 4.400000       | 6.900000        | 2.500000       |


Second:

```python
def save_boxplots():
```

The function was created to generate and save box plots for the Iris dataset. This visualization helps in understanding the range, median, and variability of each measurement within the dataset. By doing so, it provides insights into the statistical properties of the dataset and highlights any potential outliers or differences in the distributions of these variables.

The output is: 

<div style="display:flex; flex-direction:row;">
    <img src="Boxplots.png" alt="Boxplots" style="width:60%;">
</div>

Third: 

```python
def save_radviz_plot():
```

The function was created to generate and save a RadViz plot for the Iris dataset, which helps visualize the relationships and similarities between different species based on their features.

The output is: 

<div style="display:flex; flex-direction:row;">
    <img src="radviz_plot.png" alt="radviz" style="width:60%;">
</div>

**Quick observation**. A RadViz (Radial Visualization) plot is a tool for visualizing multivariate data in a two-dimensional space. Each variable is assigned to a point on the circumference of a circle (called an anchor), and each data point is plotted within the circle based on the influence of each variable.
The RadViz plot helps identify clusters of data points with similar characteristics. Looking at the resulting plot, it is clear that Iris-virginica and Iris-versicolor have very similar characteristics and are difficult to distinguish from each other, while Iris Setosa is separated from the other two species. Additionally, data points located closer to a specific anchor indicate that the corresponding variable has a stronger influence on these points. For instance, the variable "Sepal width" has the most influence on the species Iris-setosa.