In this notebook I learn more about my computational environment using materials published by the UK Data Service. The reference JN is linked is:
https://github.com/UKDataServiceOpen/comp-soc-sci/blob/master/code/bcss-notebook-four-2020-02-12.ipynb

In [None]:
#Import packages
import pandas as pd



##Knowing your computational environment

All computational social science activities are dependent on knowing how to setup, manage and share a computational environment (The Turing Way Community, 2019). This can be as simple as understanding how and where files are located on your machine, to defining and documenting which software packages, versions and configurations are necessary to execute your data analysis.

Whether you are thinking about scraping a web page or implementing an advanced machine learning algorithm, it all begins with establishing your computational environment. First, let's understand how files are stored and accessed on your machine.


##File system and working directory

It is critical that you think logicially and in an organised way about how you manage and store files for your project.

First thing to know is that files and folders stored on your machine's hard drive can and be accessed in two ways:

    Absolute path
    Relative path

First, let's figure out where we are on our machine:


In [1]:
import os # operating system package

os.getcwd() # where am i?

'C:\\Users\\sonja'

In [2]:
#What other files and folders exist where we are?
os.listdir()

['.anaconda',
 '.canopy_runtimes.json',
 '.conda',
 '.condarc',
 '.dotnet',
 '.gitconfig',
 '.ipynb_checkpoints',
 '.ipython',
 '.jupyter',
 '.matplotlib',
 '.plotly',
 '.spyder-py3',
 '.vscode',
 '3D Objects',
 'Anaconda3',
 'AppData',
 'Application Data',
 'Borough_inj_proportion.png',
 'Canopy',
 'Ch3 Processing_Raw_Text.ipynb',
 'Chatbot_data.ipynb',
 'Chatbot_NLP_experiment.ipynb',
 'Contacts',
 'Cookies',
 'CW_LFB - change in attendance times.ipynb',
 'CW_LFB - prep for ML classification, modelling and results.ipynb',
 'CW_LFB2018 - prep for machine learning classification-Copy1.ipynb',
 'datasets',
 'Datavis_barchartsscatterplots.ipynb',
 'Dataviz_linecharts.ipynb',
 'data_F_DI.csv',
 'Desktop',
 'Determine best market.ipynb',
 'df1.csv',
 'df2.csv',
 'df3.csv',
 'diabetes.png',
 'DISS - EDA London Geog PrimaryFires_2018.ipynb',
 'Diss_All_buildings_databaseLFB.ipynb',
 'DISS_Data_prep_decision_tree_modelling_02052019_THISWORKED!.ipynb',
 'Diss_Data_prep_decision_tree_modelling_

In [3]:
#Using the "data" folder as an example, let's view the absolute and relative paths to this directory:
absolute_path = os.path.abspath("data")
absolute_path

'C:\\Users\\sonja\\data'

In [4]:
os.listdir(absolute_path)
#telling me this doesn't actually exists

FileNotFoundError: [WinError 3] The system cannot find the path specified: 'C:\\Users\\sonja\\data'

##Hardware & Software
Your computational environment consists of hardware (e.g., the physical machine and its Central Processing Unit) and software (e.g., operating system, programming langauges and their versions, files). And the version of python that was installed.

In [7]:
import sys

sys.version # view current version of Python

'3.6.5 |Anaconda, Inc.| (default, Mar 29 2018, 13:32:41) [MSC v.1900 64 bit (AMD64)]'

In [8]:
!pip freeze # view installed modules on your machine

agate==1.6.1
agate-dbf==0.2.0
agate-excel==0.2.2
agate-sql==0.5.3
alabaster==0.7.10
anaconda-client==1.6.14
anaconda-navigator==1.9.7
anaconda-project==0.8.2
arcgis==1.5.0
asn1crypto==0.24.0
astroid==1.6.3
astropy==3.0.2
attrs==18.1.0
autocorrect==1.3.0
Babel==2.5.3
backcall==0.1.0
backports.shutil-get-terminal-size==1.0.0
basemap==1.1.0
beautifulsoup4==4.6.0
bitarray==0.8.1
bkcharts==0.2
blaze==0.11.3
bleach==2.1.3
bokeh==0.12.16
boto==2.48.0
Bottleneck==1.2.1
category-encoders==1.3.0
certifi==2019.6.16
cffi==1.11.5
chardet==3.0.4
click==6.7
click-plugins==1.0.4
cligj==0.5.0
cloudpickle==0.5.3
clyent==1.2.2
colorama==0.3.9
comtypes==1.1.4
conda==4.7.11
conda-build==3.10.5
conda-package-handling==1.3.11
conda-verify==2.0.0
contextlib2==0.5.5
coverage==4.5.1
cryptography==2.4.2
csvkit==1.0.3
cycler==0.10.0
Cython==0.28.2
cytoolz==0.9.0.1
dask==0.17.5
datashape==0.5.4
dbfread==2.0.7
decorator==4.3.0
descartes==1.1.0
distributed==1.21.8
docutils==0.14
entrypoints==0.2.3
et-xmlfile==1.0.1


In [9]:
sys.modules.keys() # view imported modules




Bibliography

Barba, Lorena A. et al. (2019). Teaching and Learning with Jupyter. <a href="https://jupyter4edu.github.io/jupyter-edu-book/" target=_blank>https://jupyter4edu.github.io/jupyter-edu-book/</a>.

The Turing Way Community. (2019). The Turing Way: A Handbook for Reproducible Data Science (Version v0.0.4). Zenodo. http://doi.org/10.5281/zenodo.3233986
Further reading and resources

We highly recommend the materials referenced in the Bibliography. In addition, you may find the following useful:

    <a href="https://the-turing-way.netlify.app/reproducible_environments/reproducible_environments.html" target=blank>The Turing Way: A Handbook for Reproducible Data Science [Chapter 10]</a>
    <a href="https://uoa-eresearch.github.io/eresearch-cookbook/recipe/2014/11/26/python-virtual-env/" target=_blank>Python Virtual Environments</a>

