# Welcome to our session "Introduction to Python"! 🐍

## Today's example
- **import** some data from **spreadsheets**
- **plot** data (bar charts, scatter plots)
- perform some basic **statistical** analysis
- try to make **predictions** using a simple Machine Learning model 🧐



<a target="_blank" href="https://colab.research.google.com/github/UoA-eResearch/ResBaz24Python/blob/main/StartHere_ResBaz24Python.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

# Housekeeping

- This is an interactive session; make the most of it; just listening is not enough! There won't be a recording. We are here to help you learn 🧠
- Cameras on! 📸
- If you have any questions, please ask them in the chat 💬
- Please mute your microphone 🎤  
- Be kind 😊
  - all details of the Code of Conduct 🗞️  can be found [here](https://resbaz.auckland.ac.nz/about/)
- This is a 4h session, we have 5 min breaks every hour and a 10 min break after 2h 🕒
- You will be able to vote/have an influence on the content of this session 🗳️
- This notebook and other resources will be shared with you later, always prioritise following-along over note-taking, etc. 📝


# What you can expect
<div>
<img src="https://raw.githubusercontent.com/UoA-eResearch/ResBaz24Python/main/Assets/SessionExpectations.svg" width="800"/>
</div>


# Defining goals for today's session (will be revisited at the end of the session)

- we want you to...
  - ... have fun! 🎉
  - ... follow along with us and learn some Python 🐍 while working through a few examples
    - learning by doing/ *"live coding"*
    - we mostly follow [The Carpentries style of teaching](https://swcarpentry.github.io/swc-releases/2017.02/instructor-training/15-practices/)

- we won't find the time to
  - ... start with the basics of coding
    - at the University of Auckland, we have these regular sessions:
      - [Introduction to Python](https://research-hub.auckland.ac.nz/digital-research-skills/introduction-to-python) (8h)
      - [Machine Learning (ML)](https://research-hub.auckland.ac.nz/digital-research-skills/machine-learning-workshop)
    - hand-selected links to resoures will follow in the very end
  - ... discuss different environments for Python
    - we will use Google Colab ([as mentioned](https://resbaz.auckland.ac.nz/setup/) you need a Google account)
    - there are use-cases where you won't use a Jupyter Notebook (like this one right here), you can write scripts and run them on the command-line
    - you can use an Integrated Development Environment (IDE) like [Visual Studio Code](https://code.visualstudio.com/) or [PyCharm](https://www.jetbrains.com/pycharm/)



# In 3 Statements: How is Python?
1. **Awesome**, because it is **abstract** and **versatile**
1. The **community** is **huge** (and **AI** is well trained on it): MAKE SURE IT IS OK TO USE THAT ON YOUR POTENTIALLY SENSITIVE DATA
2. It is **still a programming** language; you need to learn the **syntax** and the **logic** behind it

<div>
<img src="https://github.com/UoA-eResearch/ResBaz24Python/blob/main/Assets/MemeCoding.JPG?raw=1" width="500"/>
</div>

## Intermezzo: What is a Jupyter Notebook? And or a Google Colab Notebook?

- we oversimplify here
- Jupyter Notebook
  - you can **write** and **run** your **code**, see the **output**, and add **textual descriptions**
  - you can **share** your work with others
  - you can run it on your **local** machine or on a **server or Virtual Machine**
- a web-based environment such as Google Colab or [JupyterHub](https://research-hub.auckland.ac.nz/research-software-and-computing/advanced-compute/nectar-jupyterhub)/[BinderHub](https://research-hub.auckland.ac.nz/research-software-and-computing/advanced-compute/nectar-binderhub), etc.
- From a different perspective, cells that can contain
  - **markdown** (like this one) which is a way of formatting text with some special characters
  - **code**
  - you have a **play** button ▶️ to run the code in a cell
    - if you look closely, once that is done, you see a number in the brackets `[ ]` on the left side of the cell
    - the **sequence matters**
      - this is powerful but can also be a source of errors
      - you can go back and change only some bits of your code and run it again
      - you have to familiarise yourself with this kind of non-linear execution (especially if you are used to writing scripts)

 Now, gogogo! 🏃‍♂️

# Poll #1 Why do you attend this? 🤔

In [None]:
#@markdown ## Enter your answer here and run the cell to send it to us
answer = "" #@param {type: "string"}
import requests
requests.post(f"http://resbaz.auckland-cer.cloud.edu.au/", params={
  "question": 1499,
  "answer": answer,
  "participant_name": "Anonymous",
  "is_correct": True
}).json()

# Bare minimum to get started

## Variables and data types

### more detailed explanation:
We just imported a package/library/an external component/something that someone else has built. One repository of such packages is the Python Package Index (PyPI) [pypi.org](https://pypi.org/).

# "Over to you"

- for some examples, we have 'gamified' our learning experience and have created some neat way of you testing your knowledge
- you can input your result
- these will be uploaded to a server
- we see the result
- we share the results (later)
- and: Yes! ~~All~~ Most of that was also done in Python, yes, Application Programming Interfaces (APIs) calls and all! 🤯

# Exercices Part 1

In [None]:
#@markdown ## Enter your name here

participant_name = "" #@param {type: "string"}

In [None]:
#@markdown ## On a scale from 0 (really bad) to 10 (awesome), how do you like ResBaz so far?

rate = 0  #@param {type: "slider", min: 0, max: 10}

In [None]:
# @title Submit your opnion on ResBaz
import requests
if rate == 10:
  is_correct = True
else:
  is_correct = False
requests.post(f"http://resbaz.auckland-cer.cloud.edu.au/", params={
  "question": 0,
  "answer": rate,
  "participant_name": participant_name,
  "is_correct": is_correct
}).json()

In [None]:
# Exercise 1: sum 1486 and 1985


In [None]:
# @title Check Answer for Exercise 1
import requests
answer = list(Out.values())[-1]
if answer == 3471:
  is_correct = True
else:
  is_correct = False
requests.post(f"http://resbaz.auckland-cer.cloud.edu.au/", params={
  "question": 1,
  "answer": answer,
  "participant_name": participant_name,
  "is_correct": is_correct
}).json()

In [None]:
# Exercise 2: import math and use the math.sqrt()function to get the square root of 81


In [None]:
# @title Check Answer Exercise 2
answer = list(Out.values())[-1]
if answer == 9.0:
  is_correct = True
else:
  is_correct = False
requests.post(f"http://resbaz.auckland-cer.cloud.edu.au/", params={
  "question": 2,
  "answer": answer,
  "participant_name": participant_name,
  "is_correct": is_correct
}).json()

In [None]:
# Exercise 3: use the print function to print "Hello world!"


In [None]:
# @title Check Answer for Exercise 3
answer = list(Out.values())[-1]
if answer == "Hello world!":
  is_correct = True
else:
  is_correct = False
requests.post(f"http://resbaz.auckland-cer.cloud.edu.au/", params={
  "question": 3,
  "answer": answer,
  "participant_name": participant_name,
  "is_correct": is_correct
}).json()

## Recap
- we have used Python as a calculator
- we have imported a library (i.e. math)
- we have used a variable to store a value
- we have printed the value of a variable

## Next
- we will store a value in a variable
- we will use a different data type

# Variables and data types

- Python is dynamically typed (aka duck-typed 🦆)
- We don't have to specify what kind of data we want to store in a variable
- Python will figure it out for us

In [None]:
# assign variable


In [None]:
# print variable with f strings

In [None]:
# and we can make even shorten this further; remember if we take this to an extreme, it is considered Code Golf (https://en.wikipedia.org/wiki/Code_golf) 🏌️
# let's create a new variable that holds the non-rounded result



This does not only work for numbers or maths, etc. but also for text:

In [None]:
# assign a string

# assign a integer

In [None]:
# Let's see what happens when we try to add a string and a number


On a side-note: Python errors tend to be 'verbose', they might seem intimidating. Read them bottom to top. Use online resources to get help. There is a big community!

In [None]:
# Make Python show us the type of the variable


and another example of data types and consequences

In [None]:
# Let's try to find out how many decimal points some number has
# let's use the number Pi (because it is infinite)


# we could think that we can just put a len() around it, but that doesn't work

# Convert the float to a string

# Get the length of the string




In [None]:
# Check variable type

on a side-note: floats in Python have about 16 decimal places; the rest is cut off; a discussion (exceeding today's scope) can be found [here](https://en.wikipedia.org/wiki/Double-precision_floating-point_format)

# More complex data types

- to iterate: This is a 4h intro
- we will only show you four other data types
  - `list` (list) which is a collection of items
  - `dict`  (dictionary) which is a key-value pair
  - `np.array` ([numpy](https://numpy.org/) array) which is a collection of items with the same data type
    - alternative for use [MatLab](https://www.mathworks.com/products/matlab.html)
  - `pd.DataFrame` ([pandas](https://pandas.pydata.org/) DataFrame) which is a 2D data structure with columns that can have different data types
    - alternative to [R](https://www.r-project.org/)
		- (we will import our spreadsheet into this data type)

In [None]:
# Lets have a list of names



In [None]:
# Let's only show the first element of the list


Wait! Jane? 🤔
Yes, Python is zero-indexed! 🤯
If we want to see the first element of a list, we need to use `list[0]` 🤯

there you go!


(You can also use negative indices to access elements from the end of the list. `list[-1]` will give you the last element of the list)
You can also put a list in a list
this is then a `list of lists` or a `2D list`

<img src="https://github.com/UoA-eResearch/ResBaz24Python/blob/main/Assets/veg.png?raw=1" alt="ListOfLists" style="width: 400px"/>


In [None]:
# define a nested list


In [None]:
# or more neatly formatted with pprint

# Pretty print the list of lists



let's try to find the cilantro 🌿
in other words: lets speficy the *correct cell* in this list of lists

In [None]:
# Demonstration: How to get a subset of a list


In [None]:
# Demonstration: How to append a list


# what happens if we push the 'play' button on this cell a couple of times?

# Exercices Part 2

In [None]:
# Exercise 4: Use math.sqrt to get the square root of 36 and save it in a variable called square_root


In [None]:
# @title Check Answer for Exercise 4
import requests
if square_root == 6:
  is_correct = True
else:
  is_correct = False
requests.post(f"http://resbaz.auckland-cer.cloud.edu.au/", params={
  "question": 4,
  "answer": square_root,
  "participant_name": participant_name,
  "is_correct": is_correct
}).json()

In [None]:
#@markdown ## Exercice 5
#@markdown ### What is the data type of the variable below?
#@markdown my_fruit = "orange"

data_type = "int"  #@param ['int', 'list', 'string', 'dict']

In [None]:
# @title Check Answer for Exercise 5
import requests
if data_type == "string":
  is_correct = True
else:
  is_correct = False
requests.post(f"http://resbaz.auckland-cer.cloud.edu.au/", params={
  "question": 5,
  "answer": data_type,
  "participant_name": participant_name,
  "is_correct": is_correct
}).json()

In [None]:
#@markdown ## Exercice 6
#@markdown ### What is the data type of the variable below?
#@markdown number_of_oranges = 6

data_type = "list"  #@param ['int', 'list', 'string', 'dict']

In [None]:
# @title Check Answer  for Exercise 6
import requests
if data_type == "int":
  is_correct = True
else:
  is_correct = False
requests.post(f"http://resbaz.auckland-cer.cloud.edu.au/", params={
  "question": 6,
  "answer": data_type,
  "participant_name": participant_name,
  "is_correct": is_correct
}).json()

In [None]:
# Exercice 7
# Create a list called my_fruits and store banana, apple, orange


In [None]:
# @title Check Answer for Exercise 7
import requests
import json
if "banana" in my_fruits and "apple" in my_fruits and "orange" in my_fruits:
  is_correct = True
else:
  is_correct = False
requests.post(f"http://resbaz.auckland-cer.cloud.edu.au/", params={
  "question": 7,
  "answer": json.dumps(my_fruits),
  "participant_name": participant_name,
  "is_correct": is_correct
}).json()

## Recap
- we have stored values in a variable
- we have printed things along the way
- we have seen different data types
  - `int` (integer), `float` (floating-point number), `str` (string),
  -  `list` (list), `dict` (dictionary),
  -  `np.array` (numpy array), `pd.DataFrame` (pandas DataFrame)

## Next
- we will cover big concepts in a very short time frame

# "Quick shots"

- we don't expect you to understand everything
- this is fine
- these concepts are very powerful (and fundamental)
- so let's spend 5min to rapidly go through them

## Topics
- conditional (=*if statements*)
- repetitions (loops)
- functions

We use this structure to investigate these concepts:

- Q1 What does it **do**
- Q2 How does it **work**
- Q3 How does that **look in code**

## Conditional (=*if statements*)

### Q1 What does it do
- it gives us **options**
- you can think of it as a decision tree

### Q2 How does it work
- you define a **condition**
	- if it is met, something is done
	- if not, something else is done
	- (it can get more complex, but that should suffice for now)
	- there is also Boolean Logic (0 = false, 1 = true) and maths invovled, bit again: out of scope

### Q3 How does that look in code

In [None]:
# Example 1 if-statement: To check if a number is positive


In [None]:
# Example 1 if-statement: To check if a user is logged in


In [None]:
#@markdown ## Check if a number is even

number = 0  #@param {type: "number"}
if number % 2 == 0:
  print("The number is even!")
else:
  print("The number is odd!")

## Repetitions (=*loops*)

### Q1 What does it do
- this prevents you from having to write or repeatedly execute code
- you define a condition and as long as it is met, a certain piece of code (a code block, a cell,...) is executed until the condition isn't met anymore (or another condition is met)

### Q2 How does it work
- we focus on the for loop
- iteration/counters are more relevant in other languges; but the same idea applies to Python and might help you in generating an understanding
- we write something like: for the time that something is met, go through the indented (pushed a bit to the right) parts of the code

### Q3 How does that look in code
- side-note: Pythonian is to write something like `for name in names` (singular ... plural) wheras other languages introduce a counter (often the iteration variable `i`) and you write something like `for i < 100`. How you call your counter is up to you! Be consistent! Make sure others (or future-you) can read and understand the code (that also applies to Code-Golf!)

In [None]:
# Example 2: loops: Iterate over a list of numbers


In [None]:
# Example 2: loops: Iterate over a list of names


## Functions

### Q1 What does it do
- think of it as a variable on steroids
- instead of putting a virtual PostIt on a value (for example the integer i = 42 or the string "hello"), you encapsulate code-fragments in a function

### Q2 How does it work
- so you define your function once and can call it several times. From within the code you are writing right now and from other code (another notebook, another py script,...).
- We do that in essence, once we call a package's function (math.avg(var)) means: Go to the Math Package and get the instructions/code/cook-book-style-thingy needed to compute the average


### Q3 How does that look in code
- side-note OOP, [Magic Methods (=Dunder Methods)](https://youtu.be/1I3fuDR2S9A?si=WGxv06r_FM5R8Yut)
$\rightarrow$ We could expand on that probably for days; this leads to the Object Oriented Programming paradigm/a way of doing things. It is good if you can remember this term, but its not vital :)



In [None]:
# Example 1: Functions: Add two numbers


In [None]:
# Exercice 8
# Write a function called check_number
# It prints "Positive" if the number is positive
# "Negative" if the number is negative
# and "Zero" if the number is zero


In [None]:
# @title Check Answer for Exercise 8
import requests
def test_check_number():
    try:
        assert check_number(10) == "Positive"
        assert check_number(-5) == "Negative"
        assert check_number(0) == "Zero"
        return True
    except AssertionError:
        return False

is_correct = test_check_number()
requests.post(f"http://resbaz.auckland-cer.cloud.edu.au/", params={
  "question": 8,
  "answer": In[-1],
  "participant_name": participant_name,
  "is_correct": is_correct
}).json()

# It is your choice:

- We have **two options** for the next part of the session.
- We **might** have time for both, but we will start with the one that gets the most votes.
- If we run out of time, we will have all needed information **linked in the notebook** and you can try it out in your own time.
- **Again**, it is of utmost importance that you have **fun** and **learn** something new today! 🎉
- Python is a vast field and we can **only scratch the surface today**.
- No *fomo*! 🙅‍♂️

## Option A: The Math problem 🧮
- we use a mathematical function, e.g. `y = x^5` or `y = 0.4*x + 3`
- we add some random **noise** to the data (fancy: Monte Carlo simulation🎲 $^1$)
- we plot the data
  
$\rightarrow$ pick this one if you **like math and plotting mathematical functions**

## Option B: The waiter and tips 🍽️ 💸
- we **import** a given dataset ([seaborn tips](https://github.com/mwaskom/seaborn-data/tree/master?tab=readme-ov-file))
- we **plot** the data
- we perform some `Exploratory Data Analysis` (EDA)
- we will **add other datasets**
  
$\rightarrow$ pick this one if you **expect that your research will involve tabular data and you want to see how efficiently this can be done in Python**

## If we still have time, for both options we can:
- we perform a linear regression (we fit a curve to the data; well, a line to be honest with *linear* regression)
- we inspect some statistical values
- if we find the time, we also perform a polynomial regression (we fit a curve to the data; this time really *a curve*)

$^1$ Monte Carlo simulation is just a fancy way of using randomness and lots of tries to find out the chances of something happening, like picking a red candy from a huge jar. Scientists and mathematicians use this method to solve problems that are too tricky to figure out with just a few calculations.

In [None]:
# @title Start voting now, please :)
dropdown = 'Option A the Math Problem' # @param ["Option A the Math Problem", "Option B the Waiter and Tips"]
import requests
requests.post(f"http://resbaz.auckland-cer.cloud.edu.au/", params={
  "question": 1500,
  "answer": dropdown,
  "participant_name": participant_name,
  "is_correct": True
}).json()

# Option A: The Math problem 🧮

### A1: Using a function to create a distribution and fit a linear regression with matplotlib and scipy

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats

# Generate 100 x-values


# Generate add some noise


# Create y values using the linear function y = 0.4*x + 3 + delta


In [None]:
# Plot the scatterplot


In [None]:
# Calculate the linear regression

# Create the regression line


# Plot the scatterplot

# Add labels and title


# Show the plot


### A2: Fitting and plotting a non-linear distribution with sklearn and seaborn and pandas

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns

# Define a power function


# Generate values for x


# Compute corresponding y values




#### Plotting alternatives:
To get an initial understanding (and not a publishing-ready figure to be fully interpretable by others), we can use:
  - we can use `matplotlib.pyplot` and simply put `plt.plot(x,y)`


In [None]:
# Plot the function using Seaborn


#### Seaborn: A better workflow
- we create a Pandas DataFrame
  - think of this as a (fancy/powerful) spreadsheet in Python
- we give the columns proper names
- then we don't have to use workaround for axis title



In [None]:
# import pandas


# Create a DataFrame


In [None]:
# Plot the function using Seaborn


#### We generate some spreadsheet data with our noise definition

- we will use a [Gaussian/Normal distribution](https://numpy.org/doc/stable/reference/random/generated/numpy.random.normal.html) to add some noise to our data
- while we are at it, we also plot the new data to compare it

A2.2: Let's create some noisy data and store it as a spreadsheet and a Pandas Dataframe

In [None]:
# Add Gaussian noise


# Create the noisy y values based on the existing yClean column

# Display the top 10 lines of the DataFrame


# Savining a pandas dataframe to csv


#### A2.3: Let's plot the noisy data

In [None]:
# Plot our noisy data


### A2.3 Let's start with the regression (linear and polynomial)

- there are various ways of doing this
- there are diffferent packages to be used
- some are better for visualisation
- some offer detailed statistical analysis
  - `numpy` is often used, it provides some regression functions, often it is called 'under the hood'
  - `scipy` is a more advanced package, it offers more statistical analysis
  - `sklearn` is a machine learning package, it offers a lot of different models, including regression models
  - `statsmodels` is a package that is often used in academia, it offers a lot of statistical analysis

For this session, we will use `sklearn` as it gives us a good balance between visualisation and statistical analysis

In [None]:
import pandas as pd
import numpy as np
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt


# Extracting features (X) and target variable (y)

# Transforming features to polynomial features of degree 3


# Fitting the model


# plot distribution


# Plotting the polynomial regression curve

# Print the coefficients




[some theory](https://seaborn.pydata.org/tutorial/relational.html):
Scatter plots are highly effective, but there is no universally optimal type of visualisation. Instead, the visual representation should be adapted for the specifics of the dataset and to the question you are trying to answer with the plot.
With some datasets, you may want to understand changes in one variable as a function of time, or a similarly continuous variable. In this situation, a good choice is to draw a line plot. In seaborn, this can be accomplished by the lineplot() function, either directly or with relplot() by setting kind="line":

# Option B: The Waiter and the tips

## The dataset we use:
One waiter 🛎️ recorded information about each tip 💸 received
- tip 💵
- bill in dollars 💵
- sex of the bill payer 👫
- whether there were smokers in the party 🚬
- day of the week 📅
- time of day 🕙
- size of the party 👥

All in all, 244 tips were recorded and published in Bryant, P. G. and Smith, M (1995) Practical Data Analysis: Case Studies in Business Statistics. Homewood, IL: Richard D. Irwin Publishing

## Import data

In reality, we have several ways how we can get ahold of data.
For this course, we will not worry about collecting data (i.e. *going in the field and asking people, sketching rocks, etc.*)
We will avoid many challenges by expecting that software used to generate or create the data will have some export functionality to some common file formats.
We can import data from...
- ... a website (API, a repository (such as GitHub or others), ...)
- ... a spreadsheet (local file, uploaded file, etc.)
  - we will download an `xlsx` file and then make it more usable in Python
  - we will finally save our processed results back as a new Sheet in the `xslx` file

### B1: Import data from a respository
- we will import an Excel file use the `seaborn` package to import the `tips` dataset

In [None]:
# Define the file name and URL
file_name = "tips.xlsx"
url = "https://github.com/UoA-eResearch/ResBaz24Python/raw/main/Datasets/tips.xlsx"


! wget {url}

# Read the Excel file and load it into a pandas dataframe
tips = pd.read_excel("tips.xlsx", sheet_name="tips")
tips.head()


# use this https://seaborn.pydata.org/tutorial/regression.html

### B2: Exploratory Data Analysis (EDA)

- now that we have explored how to import data from an Excel file, make it useable as a Pandas Dataframe and save it back to excel, we will perform some basic EDA
- we can either
  - directly look at the data 🔢 🧐
  - plot it 📈 📊
- lets try to use only a few lines of code to get an **understanding** of what this 'tips' dataset is
- for more details see [this EDA example](https://www.kaggle.com/code/mirjanadmitrovic/data-visualization-using-seaborn-tips-dataset)


### B2.1 Looking at the data


In [None]:
# return first 10 rows of the dataframe


In [None]:
# generate general stats for dataframe

In [None]:
# demonstrate how to return categorical variables for a column in dataframe


# detect labels in categorical variables


### B2.2 Plotting the data
- firstly with just one line of code
- then with some refined package (Seaborn `sns pairplot`)

In [None]:
# We can plot data using the plotting function available with Pandas


In [None]:
# We can create more complex plots passing the dataframe to seaborn plotting functions


Poll:
What can you observe?
How many plots were to be expected? Are there duplicates? Are there too few?


Answer: Our dataset shows 7 categories; here we only see 3 vs. 3 = 9 plots;
- why aren't the other plots shown?
- Because of their data type!
- They are either strings or booleans, and the pairplot function does not know how to plot them.
- there are ways to show more data; for example we can include two more dimensions in a 2D plot: Size of the party and the sex of the bill payer:

In [None]:
# We can use other functions in seaborn to visualise categorical and numerical data on the same plot. e.g. relplot


Let's have a look at the non-number data types
- Don't worry, this looks like 'a lot' but, in essence, it is a loop over all the categorials (so not numeric data types) and then we plot them

In [None]:
# Create a figure with subplots


# Loop over each categorical column and plot


### B2.3 So? What do we learn about our data?

- over to you

## B3: Adding data
  - B4.1 The waiter kept collecting data; we want to append this data to our existing dataset
  - B4.2 Another waiter in the same restaurant collected data. But used a different sequence of the columns.
  - B4.3 In two other restaurants similar data was collected. We want to merge all this data into one dataset.

### B3.1 append with more data
- we will append some more data to the `tips` dataframe
- to clearly identify where the data stems from, we will add a new column to the existing dataframe named 'source'
- when we (accidentally) re-run this cell, we will keep adding data; this is not what we want. We will fix with *a conditional*.

In [None]:
# download the file
url = "https://github.com/UoA-eResearch/ResBaz24Python/raw/main/Datasets/tips_dayB.csv"
! wget {url}


# add a new column 'source' to the tips dataframe and set it to 'Day1'

# Display the top 10 lines of the dataframe


# Load the CSV file into a pandas dataframe and add source column


# Append the tips2 dataframe to the tips dataframe


# Display the top 10 lines of the dataframe

# Whats the shape of the new dataframe?

Let's see if there are duplicates in the data

In [None]:
# Find duplicated rows


there is just one duplicated line, that should be fine

### B3.2 Another waiter in the same restaurant collected data but with different column sequence
- we import a dataset from another waiter
  - but this person had a different column sequence
- we will try to merge these two datasets
- then we will try to get an impression of how they differ

In [None]:
url = "https://github.com/UoA-eResearch/ResBaz24Python/raw/main/Datasets//tips_waiterB.csv"
! wget {url}

# import the data


# add waiter columns to dataframes


# concatenate dataframes


Poll
- what should we do now?
- look at the values?
- look at some plots?

In [None]:
# How could we look at the variety of distributions for both waiters?


### B3.3 We have the same dataset collected from another restaurant

In [None]:
# download the data
url = "https://github.com/UoA-eResearch/ResBaz24Python/raw/main/Datasets/tips_restaurantB.csv"
! wget {url}

# import data from another restaurant


# add a restaurant column and name each restaurant

# concatenate dataframes

In [None]:
# how could visulise the data from each restaurant in the same plot?

In [None]:
# How could we return our write our new dataframe to a file containing tips from both restaurants?

# Revisiting the Goals

Did you?
  - ... have fun! 🎉
  - ... follow along while live coding in Python 🐍
  - ... pick up some core coding concepts while doing that?
  - ... make a decision if Python is something you want to learn/use? If so:
      - Carp
      - ML
      - hand-selected links below
  
  Now that we have this experience in Google Colab, which is one flavour of a Jupyter Notebook
  - University of Auckland researchers have access to the [Nectar Research Cloud](https://research-hub.auckland.ac.nz/research-software-and-computing/advanced-compute/nectar-research-cloud) and sevices like [JupyterHub](https://research-hub.auckland.ac.nz/research-software-and-computing/advanced-compute/nectar-jupyterhub), [BinderHub](https://research-hub.auckland.ac.nz/research-software-and-computing/advanced-compute/nectar-binderhub), and [Nectar Virtual Desktop](https://research-hub.auckland.ac.nz/research-software-and-computing/advanced-compute/nectar-virtual-desktop)
  - if you automate tasks, make machines talk to other machines, etc. you can write scripts and run them on the command-line
  - you can use an Integrated Development Environment (IDE) like [Visual Studio Code](https://code.visualstudio.com/) or [PyCharm](https://www.jetbrains.com/pycharm/)

Were your goals met? 🎯


# Resources

## Websites for Learning Python:

-   [HarvardX: Using Python for Research](https://www.edx.org/learn/python/harvard-university-using-python-for-research)
-   [LearnPython.org](https://www.learnpython.org/) This is a free interactive Python tutorial for people who want to learn Python
-   [Google\'s Python Class](https://developers.google.com/edu/python)
    Google also has an excellent set of Python tutorials for beginners
-   [Codecademy](https://www.codecademy.com/) offers an interactive
    Python course
-   [30 Days of Python](https://github.com/Asabeneh/30-Days-Of-Python)
-   [How to Think Like a Computer Scientist](https://runestone.academy/runestone/books/published/thinkcspy/index.html)

## Video for Learning Python

-   [freeCodeCamp.org](https://youtu.be/rfscVS0vtbw?si=EiY8F-GupxBrtovb)

## Python Blogs:

-   [Real Python](http://ttps://realpython.com/) is a comprehensive
    Python programming blog with eye-catching infographics, videos, and
    an overall fun vibe
-   [Planet Python](https://planetpython.org/) is a comprehensive blog
    that brings together recent Python-related posts from various other
    blogs
-   [Finxter](https://blog.finxter.com/) is another Python blog to
    follow
-   [Full Stack Python](https://www.fullstackpython.com/) is a blog that
    provides detailed tutorials on various platforms utilizing Python

## Python Games
- [CodeCombat](https://codecombat.com/): Learn programming through an interactive game-based platform where players write code to control characters and solve challenges.
- [Code on the Cob](https://www.codeonthecob.com/): A webiste to practice coding
- [CheckiO](https://checkio.org/): Learn Python coding through solving coding challenges in a gamified environment, with a focus on improving coding skills.
- [Advent of Code](https://adventofcode.com/): An annual coding challenge held in December, featuring daily programming puzzles to solve.
- [CodinGame](https://www.codingame.com/start/): Learn coding through gamified programming puzzles and challenges, covering various languages and difficulty levels.
- [LeetCode](https://leetcode.com/): Practice coding skills through a collection of coding problems, ranging from easy to hard, often used for technical interviews.
- [Exercism](https://exercism.org/tracks/python): Learn programming languages through practice problems with mentors, focusing on feedback-driven learning and community support.
