Skip to content

This is the updated class for the brecca grant: MUK 2023

Notifications You must be signed in to change notification settings

atwine/ACE-big-bio-data-class

Repository files navigation

Big Bio-data Analysis Class 2023

This repository consists of material that was used in teaching the bioinformatics major masters and PhD students in their first year of school 2023

Python For DataScience Notebooks

The book followed in this class is "Python Data Science Handbook by Jake VanderPlas"

The content in this repository will be studied as follows:

1 - Introduction to Numpy:

NumPy is a fundamental library in Python for numerical computing, and it is widely used in scientific computing, data analysis, and machine learning applications. NumPy stands for "Numerical Python," and it provides an efficient and convenient way to work with large arrays and matrices of numeric data in Python.

NumPy is built on top of the C programming language, which allows for fast and efficient computation of large arrays and matrices. It provides a powerful set of tools for array manipulation, indexing, and linear algebra operations.

2 - Data Manipulation with Pandas:

Pandas is a powerful and popular Python library used for data manipulation and analysis. It provides data structures and functions that enable easy and efficient handling of large datasets. Pandas allows users to manipulate data, perform data cleaning, filtering, grouping, and merging, and apply advanced data analysis techniques such as statistical analysis and data visualization. The library is built on top of NumPy and provides DataFrame and Series objects, which are essential for working with structured data. Pandas also integrates well with other Python libraries for scientific computing, such as Matplotlib and Scikit-learn. Its intuitive and easy-to-learn syntax, combined with its rich set of features, make it a top choice for data analysts, data scientists, and machine learning engineers. In this introduction to Pandas, we will explore its main features and demonstrate how it can be used to analyze and manipulate data.

3 - Visualization with Matplotlib:

Matplotlib is a widely used Python library for creating high-quality visualizations of data. It provides a flexible and powerful interface for creating a wide range of plots and charts, from simple line plots and scatter plots to complex 3D visualizations and heatmaps. Matplotlib is built on top of NumPy, which makes it easy to work with large datasets and perform calculations on data before plotting. With Matplotlib, users can customize every aspect of their visualizations, including color schemes, fonts, labels, and legends. Matplotlib also integrates well with other Python libraries, such as Pandas, Seaborn, and SciPy, allowing users to create complex visualizations that incorporate data from multiple sources

4 - Machine Learning:

Machine learning is a rapidly growing field of artificial intelligence that aims to enable computers to learn from data and make predictions or decisions without being explicitly programmed. In machine learning, algorithms and statistical models are trained on a large dataset to recognize patterns and relationships between features, and then use this knowledge to make predictions or decisions on new data.


File Structure

This repository consists of different files.

  • Assignments: All given assignments will be pushed to this file. Students will be advised on how to do this.
  • Data: This contains data that shall be used in the course, most of the data is freely available on Kaggle.
  • Machine Learning Example: A tutorial where the principles learnt will be applied.
  • Python for datascience notebooks: All the study notebooks to be used are kept in this folder: students need to have jupyter installed locally to use these material offline.

About

This is the updated class for the brecca grant: MUK 2023

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published