# Welcome to the Beginner Python Workshop 

**Topic: opening file formats, pip, docs**

This notebooks will give you a basic introduction to the Python world. Each topic mentioned below is also covered in the [tutorials and tutorial videos](https://github.com/GuckLab/Python-Workshops/tree/main/tutorials)

Eoghan O'Connell, Guck Division, MPL, 2021

In [None]:
# notebook metadata you can ignore!
info = {"workshop": "02",
        "topic": ["opening file formats",
                  "pip", "docs"],
        "version" : "0.0.2"}

### How to use this notebook

- Click on a cell (each box is called a cell). Hit "shift+enter", this will run the cell!
- You can run the cells in any order!
- The output of runnable code is printed below the cell.
- Check out this [Jupyter Notebook Tutorial video](https://www.youtube.com/watch?v=HW29067qVWk).

See the help tab above for more information!


# What is in this Workshop?
In this notebook we cover:
- How to open different file formats
   - Image files (.png, .tiff)
   - RTDC files (.rtdc)
   - Excel/Spreadsheet files (.csv, .tsv)
- How to install the packages needed to open the above files with `pip`.
- How to look through package documentation.

## How to open different file formats

Different files have different data formats. For example, image data is different to the data in a Microscoft Word document. Excel data is different again!

Therefore, we need to have different file formats to store different data. Sometimes these file formats are open-source, such as .hdf5 files. This means we know how the data is stored in the file. In other formats, such as those made by private companies, we might not know how the data is stored in the file exactly.

Below we will learn how to open three common file formats in our field:
- Image files (.png, .tiff)
- RTDC files (.rtdc)
- Excel/Spreadsheet files (.csv, .tsv files)

and how to use different Python packages to open each file format.

*If you are not sure how to open a file format, just search for "how do I open (file format) in python?"*

In [None]:
# import some packages we need
import numpy as np
import matplotlib.pyplot as plt

----------------------

### Image files (.png, .tiff)

There are many good packages for opening image files. We will use the `opencv` package. Another option is the `Pillow` library.

- `opencv-python`
    - To install `opencv`, go to your terminal and run `pip install opencv-python`.
    - To import `opencv` and use it when coding, use `import cv2`

- `Pillow`
    - To install `Pillow`, go to your terminal and run `pip install Pillow`.
    - To import `Pillow` and use it when coding, use `from PIL import Image`

#### Try-out the `openv-python` package
(currently doesn't work on Binder)

In [None]:
# import the `opencv-python` package
import cv2

In [None]:
# open an example image of an RTDC channel

im = cv2.imread("../data/channel_example.png", cv2.IMREAD_GRAYSCALE)

In [None]:
# im stands for image
im

In [None]:
# what does it look like?
# How to I plot it?
plt.figure(figsize=(16,9))
plt.imshow(im, cmap="gray")
# plt.colorbar()
plt.show()

In [None]:
# but what is this "im"?
type(im)

In [None]:
# how many dimensions is the image array?
im.ndim

In [None]:
# what shape is it (in pixels)?
im.shape

#### Alternatively: Try-out the `Pillow` package

In [None]:
# import the `Pillow` package
from PIL import Image

In [None]:
# open an example image of an RTDC channel
im = Image.open("../data/channel_example.png")

In [None]:
# im stands for image
im

In [None]:
# what does it look like?
# How to I plot it?
plt.figure(figsize=(16,9))
plt.imshow(im)
# plt.colorbar()
plt.show()

In [None]:
# but what is this "im"?
type(im)

In [None]:
# convert it to a numpy array
im_arr = np.array(im)
im_arr

In [None]:
# how many dimensions is the image array?
im_arr.ndim

In [None]:
# what shape is it (in pixels)?
im_arr.shape

**If you need to use image files, check out this `opencv` tutorial: https://www.geeksforgeeks.org/reading-image-opencv-using-python/ or the Pillow documentation: https://pillow.readthedocs.io/en/stable/handbook/index.html**


---------

----------------------

### RTDC files (.rtdc)

Paul Muller has created the .rtdc format in the `dclab` package.
**If you need to use RTDC files, check out the `dclab` documentation: https://dclab.readthedocs.io/en/stable/**

- To install `dclab`, go to your terminal and run `pip install dclab`.

- To import `dclab` and use it when coding, use `import dclab`

In [None]:
# import the `dclab` package
import dclab

In [None]:
# open an example rtdc file
ds = dclab.new_dataset("..\data\calibration_beads_47.rtdc")

# ignore the warnings!

In [None]:
# ds stands for dataset
ds

In [None]:
# You can see the number of events by:
print(len(ds))

In [None]:
# how do I look at the images?
# index the ds object just like a dict
print(ds["image"])

In [None]:
# get the first image
ds["image"][0]

In [None]:
# what shape is the image feature?
ds["image"].shape

In [None]:
# what does it look like?
# How to I plot it?
plt.figure()
plt.imshow(ds["image"][0])
plt.show()

In [None]:
# but what is this "ds"?
type(ds)

In [None]:
# what features does it have?
ds.features

In [None]:
# look at some other features...
ds["area_um"][0]

In [None]:
# look at some other features...
ds["deform"][0]

In [None]:
# what happens when you index a number higher that the number of values in the dataset?
ds["deform"][50]

**If you need to use RTDC files, check out the `dclab` documentation: https://dclab.readthedocs.io/en/stable/**


---------

----------------------

### Excel/spreadsheet files (.csv, .tsv)

The `pandas` package is very popular way of opening .csv or .tsv files.

- To install `pandas`, go to your terminal and run `pip install pandas`.

- To import `pandas` and use it when coding, use `import pandas as pd`

In [None]:
# import the `pandas` package
import pandas as pd

In [None]:
# open an example .csv file
# this data on the Titanic can be downloaded from here: https://osf.io/aupb4/#!
df = pd.read_csv("../data/titanic.csv")

In [None]:
# df stands for dataframe
df

In [None]:
# what is "df"?
type(df)

In [None]:
# how to view the first 10 rows?
df.head(10)

In [None]:
# how about the last 8 rows?
df.tail(8)

**If you need to use spreadsheets, check out the `pandas` documentation: https://pandas.pydata.org/pandas-docs/stable/**



---------

### Excercises

(hint: use a search engine to look for answers)

1. Open a simple text file (.txt)
2. Open a simple text file using the `with` context manager.

The path to the file is given below. [Have fun](https://www.youtube.com/watch?v=Rnpy3cC673o).

In [None]:
file_path = "../data/example_text_file.txt"



2. Verify the value of the aspect ratio for the .rtdc file.

Look at the [`dclab` documentation](https://dclab.readthedocs.io/en/stable/sec_av_notation.html#scalar-features) to find the definition of aspect ratio!
- Hint: `aspect = size_x / size_y`

In [None]:
ds = dclab.new_dataset("../data/calibration_beads_47.rtdc")

# verify the aspect ratio...


3. Apply a Gaussian filter to the image file.

Hint: try to use the `gaussian_filter` function from the `scikit-image` package. You will need to install `scikit-image` first!

In [None]:
im = cv2.imread("../data/channel_example.png", cv2.IMREAD_GRAYSCALE)

# filter the image with a Gaussian
