Skip to content

dectan/Programming-for-Data-Analysis-Project-1

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

51 Commits
 
 
 
 
 
 

Repository files navigation

Programming-for-Data-Analysis-Project-1

G00364639

This repository is used for the project given during the PROGRAMMING FOR DATA ANALYSIS module on Higher Diploma in Data Analytics course from ATU.

I have created Jupyter Notebook in Visual Studio Code, & I have added added comments to explain work, along with references

For this markup sheet, I used the following websites as guides.

  1. https://www.markdownguide.org/basic-syntax
  2. https://www.w3schools.io/file/markdown-cheatsheet/

Table of contents

Introduction

For this project you must create a data set by simulating a real-world phenomenon of your choosing. You may pick any phenomenon you wish – you might pick one that is of interest to you in your personal or professional life. Then, rather than collect data related to the phenomenon, you should model and synthesise such data using Python. We suggest you use the numpy.random package for this purpose. Specifically, in this project you should:

  • Choose a real-world phenomenon that can be measured and for which you could collect at least one-hundred data points across at least four different variables.
  • Investigate the types of variables involved, their likely distributions, and their relationships with each other.
  • Synthesise/simulate a data set as closely matching their properties as possible.
  • Detail your research and implement the simulation in a Jupyter notebook – the data set itself can simply be displayed in an output cell within the notebook.
  • How to run program

    1. My github repository is @ https://github.com/dectan/Programming-for-Data-Analysis-Project-1
    2. My github Ripository is called "Programming-for-Data-Analysis-Project-1
    3. This ripository contains a .gitignore file, a Jupyter notebook, and a readme file.
    4. My Jupyter notebook is called "Project 1.ipynb"
    5. There are no additional files required to run program as .csv is taken from website http://data.marine.ie/downloads/SmartBayIreland/GalwaySampleWeatherData.csv
    6. Libraries that need to be imported are contained in first cell of Jupyter notebook
    7. *Run all*

    Imported Libraries

    1. import numpy as np
    2. NumPy is short for "Numerical Python". It allows for matematical and logical operations on arrays efficiently. NumPy also enables user to reshape,slice ,stack and join arrays.

    3. import pandas as pd
    4. Pandas is an open source Python library that provides high performance data manipulation tools and analysis tools. It also allows for reading and writing from various file formats, such as .csv. Pandas has functions for analyzing, cleaning , exploring and manipulating data

    5. import matplotlib.pyplot as plt
    6. Matplotlib is a low level graph plotting library in python. It is open source. Using Mathplotlib, different types of plots can be created, such as scatter plots, histograms,box plots etc.

    7. import seaborn as sns
    8. Seaborn is a Python data visualization library based on matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics.It is designed to work well with dataframes from Pandas

    9. from scipy.stats import dweibull
    10. a python function that from scipy.stats that generates random numbers usinfg double weibull distribution

    About

    Programming for Data Analysis Project 1

    Resources

    Stars

    Watchers

    Forks

    Releases

    No releases published

    Packages

    No packages published