# Web-Scraping with Python

## From 0 to Data Scientist with Python - Reading and Watching List
This list should be followed step by step. There are a few overlaps between the resources, but overall the next step most often requires knowledge of the previous step.

**If you do not have any prior programming experience**
0. Automate the boring stuff with python
  * https://automatetheboringstuff.com/

**If you have prior programming experience**
1. The Quick Python Book
  * https://www.manning.com/books/the-quick-python-book-third-edition
2. Intro to Numerical Computing with NumPy (Beginner)
  * https://www.youtube.com/watch?v=V0D2mhVt7NE
3. Pandas for Data Analysis
  * https://www.youtube.com/watch?v=oGzU688xCUs
6. Web Scraping with BeautifulSoup and Requests
  * https://www.youtube.com/watch?v=ng2o98k983k
4. Data Visualization and Exploration with Python
  * https://www.youtube.com/watch?v=KvZ2KSxlWBY
5. Hands-On Machine Learning with Scikit-Learn and TensorFlow
  * Read until "Part II. Neural Networks and Deep Learning"
  * http://shop.oreilly.com/product/0636920052289.do
6. Deep Learning with Python, TensorFlow, and Keras tutorial - Playlist
  * https://www.youtube.com/watch?v=wQ8BIBpya2k&list=PLQVvvaa0QuDfhTox0AjmQ6tvTgMBZBEXN
7. But what is a Deep Neural Network?
  * https://www.youtube.com/watch?v=aircAruvnKk
  * https://www.youtube.com/watch?v=IHZwWFHWa-w
8. Deep Learning with Python
  * https://www.manning.com/books/deep-learning-with-python
9. Hands-On Machine Learning with Scikit-Learn and TensorFlow
  * Read from "Part II. Neural Networks and Deep Learning"
  * http://shop.oreilly.com/product/0636920052289.do
  
**Additional suggested material**
* Jupyterlab as a development environment - Getting Started with JupyterLab (Beginner Level)
  * https://www.youtube.com/watch?v=Gzun8PpyBCo
* Docker for Data Scientists
  * https://towardsdatascience.com/docker-for-data-scientists-5732501f0ba4
* Stanford Lecture Collection | Convolutional Neural Networks for Visual Recognition (Spring 2017) 
  * https://www.youtube.com/watch?v=vT1JzLTH4G4&list=PL3FW7Lu3i5JvHM8ljYj-zLfQRF3EO8sYv

# Programming in Python

Consider the following example: you are going grocery shopping...

#### 1. **assign variables, print function**

In [1]:
first_item = 'banana'

In [2]:
print(first_item)

banana


In [3]:
type(first_item)

str

.. want to check out prices for apples

In [4]:
second_item = 'apple'

In [5]:
shopping_cart = [first_item, second_item]

In [6]:
shopping_cart

['banana', 'apple']

In [7]:
type(shopping_cart)

list

In [8]:
third_item, fourth_item = 'milk', 'tomato'

In [9]:
shopping_cart.append(third_item)
print(shopping_cart)

['banana', 'apple', 'milk']


In [10]:
shopping_cart.append(fourth_item)
print(shopping_cart)

['banana', 'apple', 'milk', 'tomato']


In [11]:
shopping_cart[0]

'banana'

In [12]:
shopping_cart[1]

'apple'

In [13]:
shopping_cart[3]

'tomato'

In [14]:
shopping_cart[0:3]

['banana', 'apple', 'milk']

In [15]:
shopping_cart[-1]

'tomato'

In [16]:
del shopping_cart[-1]

In [17]:
shopping_cart[3]

IndexError: list index out of range

In [18]:
shopping_cart

['banana', 'apple', 'milk']

In [19]:
more_stuff = ['eggs', 'bread', 'pepper', 'salt']

In [20]:
shopping_cart.extend(more_stuff)

In [21]:
shopping_cart

['banana', 'apple', 'milk', 'eggs', 'bread', 'pepper', 'salt']

In [46]:
len(shopping_cart)

7

In [22]:
n = 0 
for item in shopping_cart:
    print(str(n) + ': ' + item)
    n = n + 1

0: banana
1: apple
2: milk
3: eggs
4: bread
5: pepper
6: salt


In [23]:
shopping_cart_singles = {
    "apple",
    "apple",
    "apple",
    "banana",
    "banana",
    "banana",
    "eggs",
    "bread",
    "pepper",
    "salt",
}

In [24]:
shopping_cart_singles

{'apple', 'banana', 'bread', 'eggs', 'pepper', 'salt'}

In [25]:
type(shopping_cart_singles)

set

In [26]:
shopping_cart_with_prices = {
    "banana": 1.00,
    "apple": 0.50,
    "milk": 0.99,
    "tomato": 0.10,
    "eggs": 1.00,
    "bread": 2.5,
    "pepper": 5.00,
    "salt": 0.50,
}

In [27]:
shopping_cart_with_prices

{'banana': 1.0,
 'apple': 0.5,
 'milk': 0.99,
 'tomato': 0.1,
 'eggs': 1.0,
 'bread': 2.5,
 'pepper': 5.0,
 'salt': 0.5}

In [28]:
type(shopping_cart_with_prices)

dict

In [29]:
shopping_cart_with_prices[0]

KeyError: 0

In [30]:
shopping_cart_with_prices['banana']

1.0

In [31]:
shopping_cart_with_prices['apple']

0.5

In [32]:
n = 0
for item, price in shopping_cart_with_prices.items():
    print('item #' + str(n) + ': ' + item + ', price: ' + str(price))
    n = n + 1

item #0: banana, price: 1.0
item #1: apple, price: 0.5
item #2: milk, price: 0.99
item #3: tomato, price: 0.1
item #4: eggs, price: 1.0
item #5: bread, price: 2.5
item #6: pepper, price: 5.0
item #7: salt, price: 0.5


In [33]:
shopping_cart_with_prices['banana'] + shopping_cart_with_prices['apple']

1.5

In [34]:
shopping_cart_with_prices['banana'] + 5 * shopping_cart_with_prices['apple']

3.5

In [35]:
(shopping_cart_with_prices["banana"] + 5 * shopping_cart_with_prices["apple"]) / 2

1.75

In [36]:
sum_of_groceries = 0
for item, price in shopping_cart_with_prices.items():
    sum_of_groceries = sum_of_groceries + price
    print(sum_of_groceries)

print("final sum of all groceries: " + str(sum_of_groceries))

1.0
1.5
2.49
2.5900000000000003
3.5900000000000003
6.09
11.09
11.59
final sum of all groceries: 11.59


In [37]:
money_wallet = 12

In [38]:
if money_wallet < sum_of_groceries:
    print("you cannot afford all these groceries!")
elif money_wallet == sum_of_groceries:
    print("you have exactly enough money!")
else:
    print("you can afford all the groceries!")

you can afford all the groceries!


In [84]:
def go_shopping(items, money, calculate_sum=False):
    """This function provides calculations for a shopping trip. Works only if 
    correct data types are used as inputs.
    
    Parameters
    ----------
    items : dict
        Shopping items that shoud be provided as a dictionary.
    money : float or int
        Amount of money available.
    calculate_sum : boolean
        True if sum of grocery shopping should be calculated, False if not.

    """

    if type(items) is dict and (type(money) is float or type(money) is int):
        shopping_cart_with_prices = items
        money_wallet = money
        sum_of_groceries = 0
        if calculate_sum == True:
            for item, price in shopping_cart_with_prices.items():
                sum_of_groceries = sum_of_groceries + price
            if money_wallet < sum_of_groceries:
                print("you cannot afford all these groceries!")
            elif money_wallet == sum_of_groceries:
                print("you have exactly enough money!")
            else:
                print("you can afford all the groceries!")
        else:
            sum_of_groceries == "no sum calculation wanted!"
    else:
        print("types of input data is wrong!")
        return
    return sum_of_groceries

In [98]:
help(go_shopping)

Help on function go_shopping in module __main__:

go_shopping(items, money, calculate_sum=False)
    This function provides calculations for a shopping trip. Works only if 
    correct data types are used as inputs.as inputs.
    
    Parameters
    ----------
    items : dict
        Shopping items that shoud be provided as a dictionary.
    money : float or int
        Amount of money available.
    calculate_sum : boolean
        True if sum of grocery shopping should be calculated, False if not.



In [99]:
# shift plus tab to show help

go_shopping()

TypeError: go_shopping() missing 2 required positional arguments: 'items' and 'money'

In [85]:
go_shopping(shopping_cart_with_prices, money_wallet, calculate_sum=True)

you can afford all the groceries!


11.59

In [86]:
sum_of_shopping_markus = go_shopping(
    shopping_cart_with_prices, money_wallet, calculate_sum=True
)

you can afford all the groceries!


In [87]:
sum_of_shopping_markus

11.59

In [88]:
sum_of_shopping_hans_dieter = go_shopping(
    {
    "beer": 30.00,
    "wine": 55.00,
    "vodka": 5.99,
    "cigarettes": 10.00
}, 50, calculate_sum=True
)

you cannot afford all these groceries!


In [89]:
sum_of_shopping_hans_dieter

100.99

In [94]:
sum_of_shopping_tina = go_shopping(
    {
    "beer": 30.00,
    "wine": 55.00,
    "vodka": 5.99,
    "cigarettes": 10.00
}, '50', calculate_sum=False
)

types of input data is wrong!


In [95]:
sum_of_shopping_tina

In [96]:
sum_of_shopping_tina = go_shopping(
    {
    "beer": 30.00,
    "wine": 55.00,
    "vodka": 5.99,
    "cigarettes": 10.00
}, 50, calculate_sum=False
)

In [97]:
sum_of_shopping_tina

0


#### 2.  **math, order of operations**

In [21]:
banana + 2 * apple

6.0

#### 3.  **list operations**

put apples and bananas in your shopping cart