<img src="https://raw.githubusercontent.com/NSF-EC/INFO490Assets/master/src/dmap/lessons/intro/html/DMAP-smt.png" align="left"/>

#Welcome to Data, Machines, and the 🐍
This is the sequel to [INFO 490](https://uicourses.web.illinois.edu/info490mh) (Introduction to Programming for Data Science).  Our focus will be on **text analysis**, **machine learning**, and of course advancing your **Python** knowledge.  We have a lot of information to share and we hope you will enjoy learning how to move data through machines built with Python.  

Each lesson requires careful reading and understanding.  
Use Piazza to ask for help.  You should tag your questions with **LessonID** (given below)

This is the first lesson and its focus is on making sure you know how to set up the programming and testing environment properly.  


# Introduction to Notebooks
For those of you who have **not** taken INFO490 and haven't coded inside a notebook the following are good resources:
  * https://colab.research.google.com/notebooks/basic_features_overview.ipynb
  * https://colab.research.google.com/notebooks/intro.ipynb

**TL;DR:** it boils down to code and text cells.  Code cells, you run; Text cells, you read;  If you get disconnected, you need to re-run all the code cells.  Google does a good job with auto-save (saved on your google drive).  But you should save it via the File menu before taking a break.

<a id="install"></a>
#Notebook Preparation for Lessons
Each lesson will start with a colab template that will be provided in the schedule tab on the course website.  Once you open the notebook, the first thing you need to do is save in on your google drive.  You can organize your drive and folders however it works best for you.


###**Step 1**: after making a copy of the original notebook  

*   Make sure this notebook is readable by selecting Share (this allows for testing)
![alt text](https://raw.githubusercontent.com/NSF-EC/INFO490Assets/master/src/dmap/lessons/intro/html/share.png)

Copy the document ID from the share URL into the variable `NOTEBOOK_ID` in the next cell (and run that cell)

In [None]:
# be sure to run this cell after changing NOTEBOOK_ID it in
NOTEBOOK_ID  = '1GDCmobYye_kk28N35oK8i_BZdSSTyFsT'  # change me!!
LESSON_ID    = 'DMAP:INTRO' # keep this as is

###**Step 2**: install the INFO490 git repository and INFO490 IDE (run the next cell)
<img src="https://raw.githubusercontent.com/NSF-EC/INFO490Assets/master/src/dmap/lessons/intro/html/sidebar.png" align="left"/>

• You can **confirm** that the class repository has been mounted <br/>
by looking at the left side and selecting the $\color{red}{\text{Files}}$ icon.

• The same sidebar menu can show you the table of contents 
of each lesson as well

• If you see an **exception**, it's most likely that you didn't make the notebook readable


In [None]:
def install_ide(lesson_id, nb_id, reload=False):
  import os
  if not os.path.exists('Bootstrap.py'):
     !wget 'https://raw.githubusercontent.com/NSF-EC/INFO490Assets/master/src/tools/Bootstrap.py' -O Bootstrap.py > out.txt 2>&1 
  try:
    import Bootstrap, importlib
    importlib.reload(Bootstrap)

    boot = Bootstrap.BootStrap()
    return boot.create_ide(lesson_id, nb_id, reload)
  except Exception as e:
    class Nop(object):
        def __init__(self, e): self.e = e
        def nop(self, *args, **kw): return("unable to test:" + self.e, None)
        def __getattr__(self, _): return self.nop 
    class IDE():
      tester=Nop(str(e))
      reader=Nop(str(e))
    return IDE()

ide = install_ide(LESSON_ID, NOTEBOOK_ID, True)

###**Step 3**: Test the IDE framework (after you run the cell, it should say: *Hello!* )


In [None]:
ide.tester.hello_world()

A few notes on testing:
*   You will need to re-run steps 1, 2, 3 if you need to re-connect 
*   When you change your code, the notebook is **not** *immediately* saved.  
*   You can use the File Menu (File->Save) to force the notebook to save if the auto-save hasn't run since your latest code change
* The testing framework reads the version saved on your google drive. Each test prints the timestamp of the last save.  

# Coding and Testing 🐍

> Now for the fun.  You can (but not required to) use the `tester` object created above to test individual functions, the notebook, and see if it's ready to be uploaded to the official grader.

The first function you need to implement is named `simple_add`.  
Finish writing the function:

In [None]:
def simple_add(a,b):
  #simple_add adds its two parameters
  return None
simple_add(0,0)

Once that is done, you can write your own tests (recommended) but also by using `tester`. 

* Test your code using the methods `test_with_button` or `test_function`.
* **Note**:  `test_with_button` works well if you need to re-load your notebook (it will be much faster).
* Remember, if you change the code, you must save the notebook before testing again.

In [None]:
# pick one!
# ide.tester.test_with_button(simple_add)
print(ide.tester.test_function(simple_add))

Finish the function `simple_mult`. And test it yourself and with `tester`

In [None]:
def simple_mult(a,b):
  #simple_mult multiplies its two parameters
  return a+b

You can also test the entire notebook (not recommended until you test each of the functions).


In [None]:
#pick one of these
# ide.tester.test_notebook()
# ide.tester.test_notebook(verbose=True)

# Notes on code writing
All notebooks have to run in the testing framework.  It's important that the following rules are adhered to:
### **Only import libraries requested**  
Each lesson will be specific to what Python libraies can be imported.  These libraries will also be available on the testing machine.
### **Reading Data will Fail**
Be careful of using ```open(filename, 'r')```.
The ```filename``` will not exist on the test machine (you will be able to read files that you write however).
<br/>You can solve this issue by following the next rule.
### **Encapsulate everything into a function**
It's a good habit and practice to always write functions (and classes).  Even the code that you write to test your own functions should be in a function:
```
def my_fancy_function(data):
    return [x for x in range(0, len(data))]

def test_it():
   data = open(some_file_name, 'r').read()
   answer = my_fancy_function(data)
   assert len(answer) == len(data), "Bad Result"

test_it()
```
The testing framework will not run any function calls that are at the module level (e.g. ```test_it()``` in the above example).

## Reading Data Files
Each lesson comes with its own data files.
You can view these data files (if you want) by looking at the directory you downloaded in the first section.
You read the files by using 
```
ide.reader.read_data_file(filename)
```
Do **not** use open directly (since it will fail when the code is tested) -- see above note.

In [None]:
def test_reader():
  # each lesson comes with it's own data files
  # use the ide.reader to read these files
  # these files are part of the INFO490Assets that you downloaded in the first part
  print(ide.reader.read_data_file('data.txt'))

test_reader()

Congrats, you read the data file


## Writing Data Files
When you write to the filesystem
```
with open('test.data', 'w') as fd:
    fd.write('WHERE AM I?')
```
A few things to note:
1.   The file is not permanent, it will disappear
2.   The file is saved at the top level (take a look at the files)

In [None]:
with open('test.data', 'w') as fd:
    fd.write('WHERE AM I?')

# That's it for the **First Lesson**!
Run the next cell to see the last part.  Don't forget to submit the lesson assignment.

In [None]:
ide.reader.view_section(1)

# LESSON ASSIGNMENT Submit for Grading ✅
Prepare this notebook for grading by doing the following steps:

1. Finish all the functions and run the tests (see `ide.tester.test_notebook`)
2. Run the `ide.tester.download_solution()` function (next cell) this will:  
  * Clean and verify the notebook and download a version you can upload to Gradescope.  It does not run tests on it.
  * If there is an Error/Exception, fix it.
  * You can also open the `sandbox_tmp/solution.py` file to match line numbers<br/>by using the sidebar menu (via the file icon) <img src="https://raw.githubusercontent.com/NSF-EC/INFO490Assets/master/src/dmap/lessons/intro/html/folder-icon.png"/>
3. Submit the downloaded **solution.py** to the corresponding assignment on Gradescope. 
  * You must use your official school email on Gradescope. 

See the [Appendix](#scrollTo=Y6FQ_T8fB2nx) if the tester is broken or you don't want to use it.

In [None]:
ide.tester.download_solution()

solution.py contains valid python; it will be downloaded


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<a id="prepare"></a>
# APPENDIX 📎
## Alternative Preparation for Submitting a Notebook for Grading
If you decide to **not** to use the ide framework (or it's not working), you can still prepare your notebook for submission.  You can download the notebook as a Python file via the $\color{red}{\text{File->Download .py}}$ menu.

But it's important to take a few steps to ensure the notebook can run in a native (non notebook) Python environment for grading.  This is true for both both Jupyter and Colab.research. Take the following precautions before submitting it (you can also use `ide.tester.is_notebook_valid_python()` to confirm the notebook is ready):
### **Comment out** any calls to functions that use `open()` for reading/writing
### **Comment out** any magic commands (those that start with % or %%):

In [None]:
%%html
  <div style="color:green;font-size:2em">Be sure to <strong>Comment</strong> me out</div>

###**Wrap** any `!` magic commands in a function so it can easily be commented **out**:


In [None]:
def danger():
  !date; 
  !uname -a; ls -la
  !find . -name '*.csv'
  !echo "You must comment out the call to danger()"
danger() # danger() is the right way

###**Wrap** any code that uses an **IPython** specific library or any library that is not installed on the grader in a function so it can easily be commented **out**:

In [None]:
def more_danger():
  from IPython.core.display import display, HTML
  display(HTML("<h1>This won't run on the grader!!</h1>"))
#more_danger() # you MUST comment this out before grading

###In General, 
* Since you will start with a notebook template and always be working with functions, the **ONLY cell you will need to comment out** is the one where the github repository is installed. Even the cell with `install_testing_framework` can be left alone.
* All text cells are ignored.  So there's no need to worry about the text for cells or formulas causing an issue.
$$f'(a) = \lim_{x \to a} \frac{f(x) - f(a)}{all - is - well}$$
* 🏆 One of the goals of this class is to teach good programming practices even though we are in an environment that discourages it.  So you will always be creating functions and classes to contain your code.