Skip to content

dgg23/data-analysis

 
 

Repository files navigation

Read me

Repo directory:

  • Projects are split by folders

Model frameworks

Others

Techniques

Bayesian Regression

Causal Inference

Applied datasets

Installation

The various analysis was built in Python 3.

Virtual environment setup

Some projects have their own requirements/environment. The general setup is installed by:

python3 -m venv dataAnalysisEnv
source dataAnalysisEnv/bin/activate
python -m pip install --upgrade pip
pip install -r requirements.txt

Markdown from Notebooks

jupyter nbconvert notebook.ipynb --to markdown

This is automated via github actions.

Standard library

Custom library installed as a dev library for continued development

VSCode

Use the settings.json file in the repo

Future areas

Aim is for future work to be incorporated by working on separate branches and merge to master when finished.

Tools/areas to explore

Datasets to explore

Tasks

  • Build project template repo
  • Publish interpret-ml piece
  • NBA
    • Player position classification model
    • Bayesian sequential team rating
    • Player VAE - how are players related
      • College stats to NBA VAE
  • M5/M4 forecasting
    • Walmart demand forecasting
    • with LightGBM
    • Greykite
  • PCA via embedding layer
  • NN to predict tempo from song, generate dummy dataset
  • Word embeddings plot with hiplot
    • Plot with PCA first and compare with hiplot
  • Compare linear regression MC dropout to theoretical results
  • Optimal car charging schedule
  • Media pipe - 3d audio
    • Face distance javascript web app with react
  • Covid UK plot against time on a map
  • Autoencoder using transfer learning?
    • what do we use for the decoder?
  • Fit a sinusoid to noisy data
    • Fourier
    • Gradient descent
    • MCMC
    • Variational inference
  • Double dip loss trajectories
  • Fitting NNs to common functions (exp etc.), deep vs wide, number of parameters for given error
  • Fit a NN to seasonal data with fourier series components
  • DoubleML on heart data to find CATE
  • Github action to publish ipynbs to markdown
  • Hierarchical models
    • Mixed effects model - is it the same as a fixed effects model (lin/log regression) with one hot encoding for the categorical variables + a fixed effect?
    • Hierarchical bayesian models - for when we have categorical features with share effects over other features
    • Fit with MCMC
    • Similarities to ridge regression - only some coefficients are regularised
    • Generate data and fit each model
    • Ref
  • Bimomial regression = logistic regression
  • Linear regression = logistic regression, relationship to Linear Thompson Sampling

About

Various data analysis work

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 99.2%
  • Other 0.8%