# Module 3: Numpy, Pandas, and Data Scraping


## Topics covered

1. <a href="#fileio">L1: File Input/Ouput</a>
1. <a href="#numpy-pandas">L2: Numpy and Pandas</a>
1. <a href="#et">L2: Data Extraction and Transformation</a>
1. [Exercises](#Exercises)



## Resources used

 * Free course book: [Think Python](../Resources/thinkpython.pdf)
 * Previous modules' labs and readings




---

## A Note on material

The course will escalate away from small isolated code snippets to more complex techniques.
This is a necessary progression, as we must build upon the first two modules to use libraries and techniques of the rest of the course.

---

# Self-Paced Labs and Readings

<a id="fileio"></a>
## Lesson 1: File Input/Ouput
1. Readings
    1. [Think Python](../Resources/thinkpython.pdf) Chapter 14: Files, Sections 14.1 - 14.11 
    
1. Labs
    1. [Files Input and Output](./labs/M3-L1-FilesIO.ipynb)
    1. [Parsing Files](./labs/M3-L1-Parsing_Files.ipynb)
1. Practices
    1. [File loading and parsing](./practices/M3-P1-Basic-FileProcessing.ipynb)

<a id="numpy-pandas"></a>
## Lesson 2: Numpy and Pandas

1. Numpy
    1. Readings
        1. [Quickstart Tutorial](https://numpy.org/devdocs/user/quickstart.html)
        1. [numpy cheat sheet](https://s3.amazonaws.com/assets.datacamp.com/blog_assets/Numpy_Python_Cheat_Sheet.pdf)
        1. (Optional) [Numpy for Matlab users](https://docs.scipy.org/doc/numpy/user/numpy-for-matlab-users.html)
        <!-- https://s3.amazonaws.com/dq-blog-files/numpy-cheat-sheet.pdf -->
    1. Labs
        1. [Overview of numpy](./labs/M3-L2-numpy-overview.ipynb)
1. Pandas
    1. Readings
        1. [A brief intro to pandas](https://pandas.pydata.org/pandas-docs/stable/getting_started/10min.html)
        1. [Data structures in pandas](https://pandas.pydata.org/pandas-docs/stable/getting_started/dsintro.html)
        1. Review: [Loading Packages and Libraries](../Module2/labs/M2-L3-LoadPackages.ipynb)
        1. [pandas cheat sheet](https://pandas.pydata.org/Pandas_Cheat_Sheet.pdf)
    1. Labs
        1. [Overview of pandas](./labs/M3-L2-pandas-overview.ipynb)
        1. Data Loading
            1. [Loading data from CSV into Pandas](./labs/M3-L2-LoadData-Pandas-CSV.ipynb)
            1. [Loading data from TSV into Pandas](./labs/M3-L2-LoadData-pandas-tsv.ipynb)
            1. [Loading data from JSON into Pandas](./labs/M3-L2-LoadData-pandas-json.ipynb)
            1. [Loading data from Excel into Pandas](./labs/M3-L2-LoadData-pandas-excel.ipynb) 
    1. Practices
        1. [Analyze GDP of countries](./practices/M3-P2-Analyze-GDP-with-Pandas.ipynb)

<a id="et"></a>
## Lesson 3: Data Extraction, Transforms, and Loading

1. Reading
    1. [Advanced Data parsing modules and packages](./resources/AdvancedDataParsing.ipynb)
    1. [Python for Data Extraction, Transforms, and Loading](./resources/ETL.ipynb)
1. Labs
    1. Scraping Web Data
        1. [HTML scraping with Beautiful Soup](./labs/M3-L3-Tutorial-BeautifulSoup.ipynb)
        1. [Brief Summary of HTML Structure, Tables](https://web.dsa.missouri.edu/static/PDF/HTMLQuickGuide.pdf)
        1. [How to Inspect a Web  Page Source for Scraper Development](https://web.dsa.missouri.edu/static/PDF/AnalyzingHTMLwithTheWebInspector.pdf)
    1. [Advanced JSON parsing](./labs/M3-L3-Advanced-JSON-Parsing.ipynb)
1. Practices:
     1. [Scraping with Beautiful Soup and Storing Structured Data](./practices/M3-P3-BS_practice.ipynb)

  



## Exercises

1. [Indexing a book with Python](./exercises/M3-E1-Advanced-FileProcessing.ipynb)
1. [A Search Tool for Data Files](./exercises/M3-E2-FormattedFileProcessing.ipynb)


## If you want to try submitting work

If you want to try submitting work, you can attempt the steps below.
<!--We will go over this in more detail the first couple weeks of the semester in the _Introduction to Data Science and Analytics_ as well.-->

Open a Terminal, see the second link for day's activities.

  * Review our work  (**Note: substitute course folder name below**)  
```
cd course_folder_name
git status
```
  * Stage Changes
```
git add Module1
```
  * Create a commit (i.e., a save point)
```
git commit -m "This is my Module 1 work"
```
  * Send these changes to the server for safe keeping (i.e., publish)
```
git push
```
  * Confirm our working folder is clean and all work is tracked
```
git status
```

----