<a href="https://colab.research.google.com/github/NickPetrilli/AI/blob/main/lab02_ai.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Artificial Intelligence
Lab 02

By R. Coleman, Ph.D.

---
The goal of this lab is to get some practice with Python.

### Background
This lab explores Python input-output for Colab.

While there are various tools for automating data manipulation, we want to get some practice for what goes on behind the scenes.

You can ask ChatGPT or whatever for help but please do not use Tensorflow, Pandas, NumPy, etc. for this lab. You will get ample opportunities to use these later.

You will use the [Iris](https://en.wikipedia.org/wiki/Iris_flower_data_set) data set about flowers. It is one of the most famous tests of AI.
The data set contains two important data ontologies:
_continuous_ and _categorical_.

Continuous data are double-precision and comparable.
The measurements that describe the flowers in the data set are continuous.
These are comparable.
However, the names of the flower species are categorical or more specifically
nominal or name-like.
They are not comparable.
For instance, we cannot subtract a setosa from a versicolor.
To do so is a category mistake.
The categorical aspects of the data have to be encoded in such as way
that makes it comparable in an unbiased way.
We'll deal with this issue in a subsequent lab.

For this lab, you are only going to read in the iris data and gather some preliminary statistics.

It could be done in a snap with Pandas. However, **do not** use
Pandas and the like here. For now, we'll focus on base Python and reserve
things like Pandas for another lab when we really need it.

At the bottom of this notebook you will find the delivery instructions.

---



### Part 1
1. Download to your computer the CSV file at this [link](https://raw.githubusercontent.com/uiuc-cse/data-fa14/gh-pages/data/iris.csv).
2. Open and inspect the file in Excel.
3. Get the min and max of each sepal length, sepal width, etc. columns.
4. Get the count for the number of setosa, virginica, and versicolor species.
5. Save the spreadsheet as a _.xlsx_ file.

---


### Part 2
1. Do Runtime > Run all.
2. Study the output.

Notice that the minima, maxima, and counts are not correct. In the next part, you're going to fix these.

---

### Part 3
Follow the tasks indicated in the cells.

In [None]:
# Task 1. Recommended: run this cell to install the Python interactive debugger in Colab.
# For details on how to install and use, see https://zohaib.me/debugging-in-google-collab-notebook/
#!pip install -Uqq ipdb

In [None]:
# Task 2. Recommended: import this cell if you want to use the interactive debugger.
#import ipdb

In [18]:
# Task 3. Run this cell once each time you process the data.
import urllib.request

url = 'https://raw.githubusercontent.com/uiuc-cse/data-fa14/gh-pages/data/iris.csv'
response = urllib.request.urlopen(url)

In [19]:
# Task 4. Follow the comments and complete the TODO's below.

# We use this import only to get the maximum float value but it's kind of overkill
# because the maximum flower measures are less than 10.
import sys
MAX_FLOAT = sys.float_info.max

# This contains the maximum (so far) measurements.
maxes = [ -MAX_FLOAT, -MAX_FLOAT, -MAX_FLOAT, -MAX_FLOAT ]

# This contains the minimum (so far) measurements.
mins = [ MAX_FLOAT, MAX_FLOAT, MAX_FLOAT, MAX_FLOAT ]

# This contains the known species and their counts (so far).
cats = {'setosa':0,'versicolor':0,'virginica':0}

# Skips the header row which we don't need.
response.readline()

# Process line by line as Iris.csv is organized this way.
for line in response:
    # Decode each line from bytes to string and remove leading/trailing whitespace
    line = line.decode('utf-8').strip()

    # Split the line into fields.
    values = line.split(",")

    # Hint: in TODOs #2-4 below, try a list comprehension.
    # For more details on FP, see https://docs.python.org/3/howto/functional.html

    # Convert each field to a float -- except the last which is a category.
    numbers = [values[0], values[1], values[2], values[3]]
    nums = list(map(float, numbers))
    iris = values[4].strip()

    # Get the maxes and mins of each field.
    for x in range(0,4):
      if nums[x] > maxes[x]:
        maxes[x] = nums[x]

      if nums[x] < mins[x]:
        mins[x] = nums[x]

    # Get the species and increment the corresponding category.
    if iris == 'setosa':
        cats['setosa'] += 1
    if iris == 'versicolor':
        cats['versicolor'] += 1
    if iris == 'virginica':
        cats['virginica'] += 1

    # Uncomment this line as a breakpoint to do single-step debugging, if needed.
    #ipdb.set_trace(context=1)
    # Process the line as needed.
    # Comment this line when done -- we know the contents
    # print(line)

# Close the response
response.close()

print(f'maxes: {maxes}')
print(f'mins: {mins}')
print(f'cats: {cats}')

maxes: [7.9, 4.4, 6.9, 2.5]
mins: [4.3, 2.0, 1.0, 0.1]
cats: {'setosa': 50, 'versicolor': 50, 'virginica': 50}


### Deliverables
1. Share this notebook as viewable only. Do not remove the ouputs.
2. Copy the link and paste it into the assignment shell for submission.
3. Upload the .xlsx you created in Part 1.
4. Complete the submission [flight checklist](https://docs.google.com/spreadsheets/d/1lgCttHGUIbCUTrd0TZIm4Nxfy8wy3jnIvNv7cUPJ-Gw/edit?usp=sharing).
4. When done, export the checklist as lab02-checklist.pdf, and upload it to the assignment shell.