# Tasks
These are my solutions to the "Tasks 2020" assessment as part of my fourth year module "Emerging Technologies".

***
### Task 1: Write a Python function called ```sqrt2``` that calculates and prints to the screen the square root of 2 to 100 decimal places. Your code should not depend on any module from the standard library1 or otherwise. You should research the task first and include references and a description of your algorithm.

#### Research
The square root of 2 can be calculated by using the Babylonian method [1] [2].

$$ x_{0} \approx \sqrt{S}, $$ <br>
$$ x_{n+1} = \frac{1}{2} \left (x_{n} + \frac{S}{x_{n}} \right), $$ <br>
$$ \sqrt{S} = \lim_{n \to \infty} x_{n}. $$

Procedures for finding the square root of a number have been known to the Babylonians from at least 1600 BC. YBC 7289 is a clay tablet which contains an approximation of the square root of 2 using base 60 to six decimal places [13].

The Babylonian method works by making an initial guess of the square root called $ x_{0} $ based on the square root $ S $. Then apply the formula to get a better approximation towards the square root until the approximation is the same as the previous iteration.

Due to limitations with the floats in Python, it is not possible to print out a number to 100 decimal places [3]. It is possible to overcome this limitation by multiplying the answer by 10**200 and then converting it to a string and then a list [4].

#### Example: Square Root of 2
Make an initial guess of 1.2 and apply the formula to get a better approximation of the square root. $ S $ is 2.

$ x_{0} \approx 1.2 $

$ x_{1} = \frac{1}{2} \left ({1.2} + \frac{2}{{1.2}} \right) = 1.433 $

$ x_{2} = \frac{1}{2} \left ({1.433} + \frac{2}{{1.433}} \right) = 1.414 $

$ x_{3} = \frac{1}{2} \left ({1.414} + \frac{2}{{1.414}} \right) = 1.414 $

Hence $ \sqrt{2} \approx 1.414 $

In [171]:
def sqrt2(S):
    """
    A function to calculate the square root of 2 using the Babylonian method.
    """

    # Guess of the approximation of the square root.
    guess = S / 2.0

    # Add 1 to the guess.
    x = guess + 1

    # Loop until the guess and x are the same.
    while(guess != x):
        # x becomes the value of the guess.
        x = guess

        # Formula is applied.
        guess = (guess + (S / guess)) / 2    

    # The task is to print out the number 2 to 100 decimal places and not every number. As such the below only works for 2.
    if (S == 2):
        # Increase the value of guess.
        guess = guess * (10 ** 200)

        # Get the guess without scientific notation.
        answer = "{0:.0f}".format(guess)

        # Convert the float to a string and then a list.
        answer = str(answer)
        answer = list(answer)

        # Then insert a decimal point.
        answer.insert(1, ".")

        # Combine all the digits.
        answer = "".join(answer)

        # Set answer to 100 decimal places.
        answer = answer[0:102]

        return answer

    # For any number other than 2.
    else:
        return guess

print("The square root of 2 is " + sqrt2(2))

The square root of 2 is 1.4142135623730948729028552009655273607485832256919402696392729172311702355446324552662687021007677738


In [172]:
# Test the function on 100.
print("The square root of 100 is {:.100f}".format(sqrt2(100)))

The square root of 100 is 10.0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000


In [173]:
# Test the function on 5.
print("The square root of 5 is {:.100f}".format(sqrt2(5)))

The square root of 5 is 2.2360679774997898050514777423813939094543457031250000000000000000000000000000000000000000000000000000


In [174]:
# Test the function on 3.
print("The square root of 3 is {:.100f}".format(sqrt2(3)))

The square root of 3 is 1.7320508075688771931766041234368458390235900878906250000000000000000000000000000000000000000000000000


In [175]:
# Test the function on 932.
print("The square root of 932 is {:.100f}".format(sqrt2(932)))

The square root of 932 is 30.5286750449474979518527106847614049911499023437500000000000000000000000000000000000000000000000000000


A better way to calcaulate the square root of a number to 100 decimal places is as follows:

In [176]:
def sqrt2V2(S):
    x = S * 10 ** 200

    r = x

    def test_diffs(x, r):
        d0 = abs(x - r**2)
        dm = abs(x - (r-1)**2)
        dp = abs(x - (r+1)**2)
        minimised = d0 <= dm and d0 <= dp
        below_min = dp < dm
        return minimised, below_min

    while True:
        oldr = r
        r = (r + x // r) // 2

        minimised, below_min = test_diffs(x, r)
        if minimised:
            break

        if r == oldr:
            if below_min:
                r += 1
            else:
                r -= 1
            minimised, _ = test_diffs(x, r)
            if minimised:
                break

    return f"{r // 10**100}.{r % 10**100:0100d}"

print("The square root of 2 is " + sqrt2V2(2))

The square root of 2 is 1.4142135623730950488016887242096980785696718753769480731766797379907324784621070388503875343276415727


#### References
[1] Methods of computing square roots; Babylonian method; https://en.wikipedia.org/wiki/Methods_of_computing_square_roots#Babylonian_method

[2] Python Math: Computing square roots using the Babylonian method; Python Math: Exercise-18 with Solution; https://www.w3resource.com/python-exercises/math/python-math-exercise-18.php

[3] 15. Floating Point Arithmetic: Issues and Limitations; https://docs.python.org/3/tutorial/floatingpoint.html

[4] Is there a way to create more decimal points on Python without importing a library/module?; https://stackoverflow.com/a/64278569

[13] YBC 7289; https://en.wikipedia.org/wiki/YBC_7289;

***
### Task 2: The Chi-squared test for independence is a statistical hypothesis test like a t-test. It is used to analyse whether two categorical variables are independent. The Wikipedia article gives the table below as an example [4], stating the Chi-squared value based on it is approximately 24.6. Use ```scipy.stats``` to verify this value and calculate the associated p value. You should include a short note with references justifying your analysis in a markdown cell.

The second task is to verify the chi-squared value of approximately 24.6 using the data from the table below.

|              	| A   	| B   	| C   	| D   	| total 	|
|--------------	|-----	|-----	|-----	|-----	|-------	|
| White collar 	| 90  	| 60  	| 104 	| 95  	| 349   	|
| Blue collar  	| 30  	| 50  	| 51  	| 20  	| 151   	|
| No collar    	| 30  	| 40  	| 45  	| 35  	| 150   	|
| Total        	| 150 	| 150 	| 200 	| 150 	| 650   	|

The null hypothesis is that each person's neighborhood of residence is independent of the person's occupational classification.

#### Research
A chi-square (χ2) statistic is a test that measures how a model compares to actual observed data. The data used in calculating a chi-square statistic must be random, raw, mutually exclusive, drawn from independent variables, and drawn from a large enough sample. For example, the results of tossing a fair coin meet these criteria [5].

A p value is used in hypothesis testing to help you support or reject the null hypothesis. The p value is the evidence against a null hypothesis. The smaller the p-value, the stronger the evidence that you should reject the null hypothesis [8].

The formula to calculate the expected value for a cell is: [6]

$$ E_{ij} = \frac{R_iC_j}{N} $$

Where
<br>$ R $  = row
<br>$ C $ = column
<br>$ N $ = total
<br>for $i$th row and $j$th column

The chi-squared formula is: [7]

$$ \chi^2_c = \frac{(O_i - E_i)^2}{E_i} $$

Where
<br>$ c $ = degrees of freedom
<br>$ {O}_i $ =	observed value
<br>$ E_{i}	$ =	expected value

#### Example: Calculating the Expected Value of White Collar Workers for Column A
Calculating the expected value of white collar workers for column A would be as follows:

$ E = \frac{349 \times 150}{650} $

$ E \approx 80.5385 $

Then use the chi-square formula where the observed value $ O $ is 90 and expected value $ E $ is 80.5385:

$ \chi^2 = \frac{(90 - 80.5385)^2}{80.5385} $

$ \chi^2 \approx 1.1115 $

Doing this for all cells and adding them together will give an approximate value of 24.6.

In [177]:
import scipy.stats as stats
"""
Code to get chi-squared statistics for the data in the table.
"""

# Populate the arrays.
whiteCollar = [90, 60, 104, 95]
blueCollar = [30, 50, 51, 20]
noCollar = [30, 40, 45, 35]

data = [whiteCollar, blueCollar, noCollar]

chi2, p, dof, ex = stats.chi2_contingency(data)

# Print out the values.
print("chi2: ", chi2)
print("p-value: ", p)
print("degrees of freedom: ", dof)
print("expected frequencies:")
print(ex)

chi2:  24.5712028585826
p-value:  0.0004098425861096696
degrees of freedom:  6
expected frequencies:
[[ 80.53846154  80.53846154 107.38461538  80.53846154]
 [ 34.84615385  34.84615385  46.46153846  34.84615385]
 [ 34.61538462  34.61538462  46.15384615  34.61538462]]


The p-value of the data is 0.0004098425861096696. As this p-value is less than 0.05 it is statistically significant against the 
null thypothesis. Therefore, we must reject the null thypothesis which states that that each person's neighborhood of residence is independent of the person's occupational classification [9].

#### References
[5] Chi-Square (χ2) Statistic Definition; What Is a Chi-Square Statistic?; https://www.investopedia.com/terms/c/chi-square-statistic.asp;

[6] The chi-square test; Getting expected values; https://web.stanford.edu/class/psych252/cheatsheets/chisquare.html;

[7] Chi-Square (χ2) Statistic Definition; The Formula for Chi-Square Is; https://www.investopedia.com/terms/c/chi-square-statistic.asp;

[8] P Value Definition; https://www.statisticshowto.com/p-value/;

[9] What a p-value tells you about statistical significance; https://www.simplypsychology.org/p-value.html

***
### Task 3: The standard deviation of an array of numbers ```x``` is calculated using ```numpy``` as ```np.sqrt(np.sum((x - np.mean(x))**2)/len(x)) ```. However, Microsoft Excel has two different versions of the standard deviation calculation, ```STDEV.P``` and ```STDEV.S ```. The ```STDEV.P``` function performs the above calculation but in the ```STDEV.S``` calculation the division is by ```len(x)-1``` rather than ```len(x) ```. Research these Excel functions, writing a note in a Markdown cell about the difference between them. Then use ```numpy``` to perform a simulation demonstrating that the ```STDEV.S``` calculation is a better estimate for the standard deviation of a population when performed on a sample. Note that part of this task is to figure out the terminology in the previous sentence.

#### Research
A standard deviation is the measure of the spread of data values [11]. Assuming normal distribution, 68% of the values are within one standard deviation of the mean, 95% are within two standard deviations and 99.7% are within three standard deviations [12]. 

#### Difference Between ```STDEV.P``` and ```STDEV.S```
The ```STDEV.P``` function is used for data representing the entire population while the ```STDEV.S``` function is used data that is a sample of the population [10].

The formula for calculating the standard deviation for a population is: [10]

$$ \sigma = {\sqrt {\frac {\sum(x_{i}-{\mu})^{2}}{N}}} $$

Where
<br>$ \sigma $ = population standard deviation
<br>$ N $ =	the size of the population
<br>$ x_i $	= each value from the population
<br>$ \mu $ = the population mean

The formula for calculating the standard deviation for a sample is:

$$ s = {\sqrt {\frac {\sum _{i=1}^{N}(x_{i}-{\overline {x}})^{2}}{N - 1}}} $$

Where
<br>$ s $ = sample standard deviation
<br>$ N $ = the number of observations
<br>$ x_i $ = the observed values of a sample item
<br> $ \overline {x} $ = the mean value of the observations

In [178]:
import numpy as np

# https://numpy.org/doc/stable/reference/random/generated/numpy.random.normal.html

loc = 100
scale = 10
size = 10000

data = np.random.normal(loc, scale, size)

In [179]:
print("Mean of the data", np.mean(data))

Mean of the data 99.96463259021039


In [180]:
# Standard deviation population (STDEV.P) function on the population.
def STDEV_P(x):
    return np.sqrt(np.sum((x - np.mean(x)) ** 2) / len(x))

print("Standard deviation population (STDEV.P)", STDEV_P(data))

Standard deviation population (STDEV.P) 9.93684460510139


In [181]:
# Standard deviation sample (STDEV.S) function on the population.
def STDEV_S(x):
    return np.sqrt(np.sum((x - np.mean(x)) ** 2) / (len(x) - 1))

print("Standard deviation sample (STDEV.S)", STDEV_S(data))

Standard deviation sample (STDEV.S) 9.937341484597917


In [182]:
# Get a sample of 10% of the total.
sampleOfData = np.random.choice(data, 2000)

stdSSample = STDEV_S(sampleOfData)
stdSPopulation = STDEV_S(data)
stdPPopulation = STDEV_P(data)

# Compare the two values to STDEV.P being used on the whole population.
print("STDEV.P of the whole data", stdPPopulation);

# Get the standard deviation using STDEV.S on a sample and on the whole population.
print("\nSTDEV.S of the sample data:", stdSSample)
print("Accuracy (closer to 100% is more accurate):", (stdSSample / stdPPopulation) * 100, "%")

print("\nSTDEV.S of the whole data:", stdSPopulation)
print("Accuracy (closer to 100% is more accurate):", (stdSPopulation / stdPPopulation) * 100, "%")


STDEV.P of the whole data 9.93684460510139

STDEV.S of the sample data: 9.89469199673814
Accuracy (closer to 100% is more accurate): 99.57579483187642 %

STDEV.S of the whole data: 9.937341484597917
Accuracy (closer to 100% is more accurate): 100.00500037503124 %


#### Conclusion
Though the task wanted to show that using STDEV.S is a better estimate for the standard deviation of a population when performed on a sample, my simulations prove that using STDEV.S on the whole data rather on a sample brings a result closer to STDEV.P being used on the population.

#### References
[10] STDEV.S function; Remarks; https://support.microsoft.com/en-us/office/stdev-s-function-7d69cf97-0c1f-4acf-be27-f3e83904cc23

[11] Population and sample standard deviation review; Population and sample standard deviation; https://www.khanacademy.org/math/statistics-probability/summarizing-quantitative-data/variance-standard-deviation-sample/a/population-and-sample-standard-deviation-review;

[12] 68–95–99.7 rule; https://en.wikipedia.org/wiki/68%E2%80%9395%E2%80%9399.7_rule

***
### Task 4: Use ```scikit-learn``` to apply k-means clustering to Fisher’s famous Iris data set. You will easily obtain a copy of the data set online. Explain in a Markdown cell how your code works and how accurate it might be, and then explain how your model could be used to make predictions of species of iris.

#### Research
```scikit-learn``` is a machine learning library for Python [14].

Clustering is the task of dividing the population or data points into a number of groups such that data points in the same groups are more similar to other data points in the same group and dissimilar to the data points in other groups. It is basically a collection of objects on the basis of similarity and dissimilarity between them [17].

The K-means algorithm starts by randomly choosing a centroid value for each cluster. After that the algorithm iteratively performs three steps: (i) Find the Euclidean distance between each data instance and centroids of all the clusters; (ii) Assign the data instances to the cluster of the centroid with nearest distance; (iii) Calculate new centroid values based on the mean values of the coordinates of all the data instances from the corresponding cluster [16].

#### Difference Between Supervised and Unsupervised Learning
In supervised learning, the algorithm learns on a labeled dataset, providing an answer key that the algorithm can use to evaluate its accuracy on training data. An unsupervised model, in contrast, provides unlabeled data that the algorithm tries to make sense of by extracting features and patterns on its own [15].

#### Fisher's Iris Data
The data set consists of 50 samples from each of three species of Iris (Iris setosa, Iris virginica and Iris versicolor). Four features were measured from each sample: the length and the width of the sepals and petals, in centimeters. Based on the combination of these four features, Fisher developed a linear discriminant model to distinguish the species from each other [18].


#### References
[14] scikit-learn; https://en.wikipedia.org/wiki/Scikit-learn;

[15] SuperVize Me: What’s the Difference Between Supervised, Unsupervised, Semi-Supervised and Reinforcement Learning?; https://blogs.nvidia.com/blog/2018/08/02/supervised-unsupervised-learning/;

[16] K-Means Clustering with Scikit-Learn; Introduction; https://stackabuse.com/k-means-clustering-with-scikit-learn/;

[17] Clustering in Machine Learning; Introduction to Clustering; https://www.geeksforgeeks.org/clustering-in-machine-learning/;

[18] Iris flower data set; https://en.wikipedia.org/wiki/Iris_flower_data_set;

In [33]:
# Fisher's Iris Data.
irisDataSet = [[5.1, 3.5, 1.4, 0.2],
        [4.9, 3. , 1.4, 0.2],
        [4.7, 3.2, 1.3, 0.2],
        [4.6, 3.1, 1.5, 0.2],
        [5. , 3.6, 1.4, 0.2],
        [5.4, 3.9, 1.7, 0.4],
        [4.6, 3.4, 1.4, 0.3],
        [5. , 3.4, 1.5, 0.2],
        [4.4, 2.9, 1.4, 0.2],
        [4.9, 3.1, 1.5, 0.1],
        [5.4, 3.7, 1.5, 0.2],
        [4.8, 3.4, 1.6, 0.2],
        [4.8, 3. , 1.4, 0.1],
        [4.3, 3. , 1.1, 0.1],
        [5.8, 4. , 1.2, 0.2],
        [5.7, 4.4, 1.5, 0.4],
        [5.4, 3.9, 1.3, 0.4],
        [5.1, 3.5, 1.4, 0.3],
        [5.7, 3.8, 1.7, 0.3],
        [5.1, 3.8, 1.5, 0.3],
        [5.4, 3.4, 1.7, 0.2],
        [5.1, 3.7, 1.5, 0.4],
        [4.6, 3.6, 1. , 0.2],
        [5.1, 3.3, 1.7, 0.5],
        [4.8, 3.4, 1.9, 0.2],
        [5. , 3. , 1.6, 0.2],
        [5. , 3.4, 1.6, 0.4],
        [5.2, 3.5, 1.5, 0.2],
        [5.2, 3.4, 1.4, 0.2],
        [4.7, 3.2, 1.6, 0.2],
        [4.8, 3.1, 1.6, 0.2],
        [5.4, 3.4, 1.5, 0.4],
        [5.2, 4.1, 1.5, 0.1],
        [5.5, 4.2, 1.4, 0.2],
        [4.9, 3.1, 1.5, 0.2],
        [5. , 3.2, 1.2, 0.2],
        [5.5, 3.5, 1.3, 0.2],
        [4.9, 3.6, 1.4, 0.1],
        [4.4, 3. , 1.3, 0.2],
        [5.1, 3.4, 1.5, 0.2],
        [5. , 3.5, 1.3, 0.3],
        [4.5, 2.3, 1.3, 0.3],
        [4.4, 3.2, 1.3, 0.2],
        [5. , 3.5, 1.6, 0.6],
        [5.1, 3.8, 1.9, 0.4],
        [4.8, 3. , 1.4, 0.3],
        [5.1, 3.8, 1.6, 0.2],
        [4.6, 3.2, 1.4, 0.2],
        [5.3, 3.7, 1.5, 0.2],
        [5. , 3.3, 1.4, 0.2],
        [7. , 3.2, 4.7, 1.4],
        [6.4, 3.2, 4.5, 1.5],
        [6.9, 3.1, 4.9, 1.5],
        [5.5, 2.3, 4. , 1.3],
        [6.5, 2.8, 4.6, 1.5],
        [5.7, 2.8, 4.5, 1.3],
        [6.3, 3.3, 4.7, 1.6],
        [4.9, 2.4, 3.3, 1. ],
        [6.6, 2.9, 4.6, 1.3],
        [5.2, 2.7, 3.9, 1.4],
        [5. , 2. , 3.5, 1. ],
        [5.9, 3. , 4.2, 1.5],
        [6. , 2.2, 4. , 1. ],
        [6.1, 2.9, 4.7, 1.4],
        [5.6, 2.9, 3.6, 1.3],
        [6.7, 3.1, 4.4, 1.4],
        [5.6, 3. , 4.5, 1.5],
        [5.8, 2.7, 4.1, 1. ],
        [6.2, 2.2, 4.5, 1.5],
        [5.6, 2.5, 3.9, 1.1],
        [5.9, 3.2, 4.8, 1.8],
        [6.1, 2.8, 4. , 1.3],
        [6.3, 2.5, 4.9, 1.5],
        [6.1, 2.8, 4.7, 1.2],
        [6.4, 2.9, 4.3, 1.3],
        [6.6, 3. , 4.4, 1.4],
        [6.8, 2.8, 4.8, 1.4],
        [6.7, 3. , 5. , 1.7],
        [6. , 2.9, 4.5, 1.5],
        [5.7, 2.6, 3.5, 1. ],
        [5.5, 2.4, 3.8, 1.1],
        [5.5, 2.4, 3.7, 1. ],
        [5.8, 2.7, 3.9, 1.2],
        [6. , 2.7, 5.1, 1.6],
        [5.4, 3. , 4.5, 1.5],
        [6. , 3.4, 4.5, 1.6],
        [6.7, 3.1, 4.7, 1.5],
        [6.3, 2.3, 4.4, 1.3],
        [5.6, 3. , 4.1, 1.3],
        [5.5, 2.5, 4. , 1.3],
        [5.5, 2.6, 4.4, 1.2],
        [6.1, 3. , 4.6, 1.4],
        [5.8, 2.6, 4. , 1.2],
        [5. , 2.3, 3.3, 1. ],
        [5.6, 2.7, 4.2, 1.3],
        [5.7, 3. , 4.2, 1.2],
        [5.7, 2.9, 4.2, 1.3],
        [6.2, 2.9, 4.3, 1.3],
        [5.1, 2.5, 3. , 1.1],
        [5.7, 2.8, 4.1, 1.3],
        [6.3, 3.3, 6. , 2.5],
        [5.8, 2.7, 5.1, 1.9],
        [7.1, 3. , 5.9, 2.1],
        [6.3, 2.9, 5.6, 1.8],
        [6.5, 3. , 5.8, 2.2],
        [7.6, 3. , 6.6, 2.1],
        [4.9, 2.5, 4.5, 1.7],
        [7.3, 2.9, 6.3, 1.8],
        [6.7, 2.5, 5.8, 1.8],
        [7.2, 3.6, 6.1, 2.5],
        [6.5, 3.2, 5.1, 2. ],
        [6.4, 2.7, 5.3, 1.9],
        [6.8, 3. , 5.5, 2.1],
        [5.7, 2.5, 5. , 2. ],
        [5.8, 2.8, 5.1, 2.4],
        [6.4, 3.2, 5.3, 2.3],
        [6.5, 3. , 5.5, 1.8],
        [7.7, 3.8, 6.7, 2.2],
        [7.7, 2.6, 6.9, 2.3],
        [6. , 2.2, 5. , 1.5],
        [6.9, 3.2, 5.7, 2.3],
        [5.6, 2.8, 4.9, 2. ],
        [7.7, 2.8, 6.7, 2. ],
        [6.3, 2.7, 4.9, 1.8],
        [6.7, 3.3, 5.7, 2.1],
        [7.2, 3.2, 6. , 1.8],
        [6.2, 2.8, 4.8, 1.8],
        [6.1, 3. , 4.9, 1.8],
        [6.4, 2.8, 5.6, 2.1],
        [7.2, 3. , 5.8, 1.6],
        [7.4, 2.8, 6.1, 1.9],
        [7.9, 3.8, 6.4, 2. ],
        [6.4, 2.8, 5.6, 2.2],
        [6.3, 2.8, 5.1, 1.5],
        [6.1, 2.6, 5.6, 1.4],
        [7.7, 3. , 6.1, 2.3],
        [6.3, 3.4, 5.6, 2.4],
        [6.4, 3.1, 5.5, 1.8],
        [6. , 3. , 4.8, 1.8],
        [6.9, 3.1, 5.4, 2.1],
        [6.7, 3.1, 5.6, 2.4],
        [6.9, 3.1, 5.1, 2.3],
        [5.8, 2.7, 5.1, 1.9],
        [6.8, 3.2, 5.9, 2.3],
        [6.7, 3.3, 5.7, 2.5],
        [6.7, 3. , 5.2, 2.3],
        [6.3, 2.5, 5. , 1.9],
        [6.5, 3. , 5.2, 2. ],
        [6.2, 3.4, 5.4, 2.3],
        [5.9, 3. , 5.1, 1.8]]

# print(irisDataSet)

from sklearn import datasets
from sklearn.cluster import KMeans

X = iris.data[:, :2]
y = iris.target

km = KMeans(n_clusters = 3, n_jobs = 4, random_state=21)
km.fit(irisDataSet)

centers = km.cluster_centers_
print(centers)

[[5.9016129  2.7483871  4.39354839 1.43387097]
 [5.006      3.428      1.462      0.246     ]
 [6.85       3.07368421 5.74210526 2.07105263]]




#### References
[14] scikit-learn; https://en.wikipedia.org/wiki/Scikit-learn;

[15] The Most Comprehensive Guide to K-Means Clustering You’ll Ever Need; Introduction to K-Means Clustering; https://www.analyticsvidhya.com/blog/2019/08/comprehensive-guide-k-means-clustering/;