## Scripting

* We will create a script for automatically creating wordclouds from multiple internet pages
* Start with creating a new, empty folder
* Navigate to this folder with the Anaconda Prompt
    * You can right-click on the folder and use 'Anaconda Prompt here'
* Create a new file in the folder called `util.py`
* Open this file with your text editor. If you don't have one, download and install Atom from `atom.io`
* Write in the file and save:

    `print("Hello!")` 


* Run `python hello.py`

## Comparison with Jupyter

* The language is the same
* With scripting, there is no memory of the variables, you start over every time
* Jupyter is good for interactive analysis
* "Real" programs in python will always consist of many scripts working together



In [3]:
import feedparser

url = 'http://feeds.nos.nl/nosnieuwseconomie'
feed = feedparser.parse(url)

[e.title for e in feed["entries"]]

["'Miljoenenbonus Verhagen nekte postfusie met Belgen'",
 'De andere grote banken gingen SNS al voor met reorganisatie',
 'SNS wil honderden banen schrappen',
 'KLM en piloten bereiken overeenstemming over pensioen',
 'Minder weekendjes Londen na Brexit-referendum',
 'Nederland deze eeuw grootste nettobetaler van de EU',
 "'Nieuwkomers dreigen achter te blijven door gebrekkige hulp'",
 'Nintendo breekt met verleden: Mario nu ook op smartphones',
 'Achmea wil 2000 banen schrappen',
 'Amerikaanse rente omhoog, de euro omlaag',
 "'Werk Belastingdienst niet in gevaar'",
 'Verbruik 4G-data in jaar tijd verdubbeld',
 'Beroepsverbod voor bankier wegens overtreden bankiers-eed',
 'Werkloosheid zakt onder het half miljoen']

## Exercise

* Copy the functions `get_lowercase_words`, `filter_letters` and `get_filtered_lowercase_words` into `util.py`
* Add one line at the end:
    
    `print(get_filtered_lowercase_words("test.. dit is een test"))
    
* Create a new file called `main.py` in the same folder as `util.py`
* Make it print all the titles of the articles in the RSS feed

## Combining scripts

* You can use the functions in the `util.py` file in `main.py` using `import util`
* The functions are then available as `util.get_filtered_lowercase_words` etcetera
* If you import a file, everything is "executed", so the `print` statement from earlier is also called
* You can prevent this with a special block of code

```
if __name__ == "__main__":
    
    # this is only called if this function is executed directly, not as import
    print("Hello!)
```

## Exercise

* Create a script that downloads all pages from the RSS feed, creates wordclouds from them, and saves them to disk
    * Use the functions from `util.py` to get a list of words

## Using GitHub

* `git` is a program for keeping track of changes on code
* Alternatives are `hg` and `svn`, but `git` is very popular
* `git` is a simple program / command that runs on your computer, but you can collaborate with others using a central repository
* GitHub hosts such repositories, for free if your code is public

## Git commands

* If you have installed Git Bash, you can use `git`
* Start Git Bash, navigate to the folder you're working in

```
git status               # see status
git clone                # copy an external repository to your disk
git add                  # add a file to the version control
git commit -m {message}  # commit your changes and added files to a checkpoint
git push                 # push the changes to the external repository
```

## Exercises

* Create an account on GitHub, use `git clone` to copy your own repository to disk
* Copy the scripts you created to the new folder
* Use `git add` to add the files to your repository
* Create your first commit with
    * `git commit -m 'My first commit!'`
* Push your code with `git push`
* Add a file called `README.md`, write some text in it to introduce your repository
* Add, commit and push again