This workshop is provided by the Research and Data Services group in the Claude Moore Health Sciences Library. The Research & Data Services team guides researchers, clinicians, and students in data management and sharing, cleaning and preparation, analysis and visualizations using R, Python, GraphPad Prism, Excel, SPSS, Stata, and SAS. For more information on this group or to contact us visit the links below.
This workshop will be focussing on the fundamentals of Python, a general purpose programming language. In advance of the workshop, we request that you install Anaconda. Links to the install site are below.
- Go here: https://github.com/HSL-Data/IntroPython/
- Click "Code" (green button in upper right corner)
- Click "Download Zip"
- Unzip that directory and move it somewhere that is easy to find (like your Desktop, for example)
From www.python.org: "Python is an interpreted, object-oriented, high-level programming language with dynamic semantics. Its high-level built in data structures, combined with dynamic typing and dynamic binding, make it very attractive for Rapid Application Development, as well as for use as a scripting or glue language to connect existing components together. Python's simple, easy to learn syntax emphasizes readability and therefore reduces the cost of program maintenance."
We will be using Anaconda which is a distribution of python that also includes addtional packages and software used for data analysis. It can make it easier to get started with programming because it installs many of the things you may need for you.
JupyterLab is a web-based interactive development environment (IDE) that comes with Anaconda. It has a flexible interface that will allow us to work with notebooks. Typically IDEs comes with:
- text editor
- variable explorer
- console
We will need to install a package to allow us to use a variable explorer in JupyterLab. We'll also go over this together, so no need to install by yourself.
- Both are open source programming languages
- Python is a general purpose programming language while R is a statistical programming language
- However both languages are still evolving and expanding
- Both have the option to import additonal functionaly through libraries/packages
- For data analysis/statistics in Python Pandas, NumPy and SciPy are very popular
- Discuss some basics:
- What even is Python? What is Anaconda?
- Why learn Python?
- What is an IDE?
- Coding basics:
- Simple variables assignment/types
- Lists, tuples, dictionaries
- Looping
- Conditional statements
- Functions: built-in and your own