# Shell basics

In this section we will look at how to setup a project as well as how to work with the command line. You might be used to working with graphical user interfaces for most of your coding career, but it is useful to know how to work through the shell. This is not something that is often taught in economics programs, but if you are looking at a career in data science this might be very useful. We will start with looking at the basic file structures within a computer.  

**Note**: The output displayed will be for the file structure on my computer. For your PC things will obviously be different.

The first thing that we will do is navigate to the proper directory where our files are located that we want to use for our project. Normally, I would first check what my **p**resent **w**orking **d**irectory is. You have encountered the idea of a present working directory in R, so this should be familiar. We can do this as follows,  . 

In [1]:
pwd 

'/home/dawie/Dropbox/2022/871-data-science/DataScience-871/notebooks'

On my computer the **pwd** is the `notebooks` folder. In order to see the files **l**i**s**ted in the current working directory we use the `ls` command to list the files.  

In [2]:
ls

01_shell_basics.ipynb     lecture-3.html         lecture-5.R
06_fundamentals_ml.ipynb  lecture-3.Rmd          lecture-5.Rmd
lecture-1.html            lecture-3-slides.html  lecture-8.html
lecture-1.Rmd             lecture-3-slides.Rmd   lecture-8.Rmd
lecture-2.html            lecture-4.html         lecture-9.html
lecture-2.Rmd             lecture-4.Rmd          lecture-9.Rmd
lecture-2-slides.html     lecture-4-slides.html
lecture-2-slides.Rmd      lecture-4-slides.Rmd


We can see the files listed here are Jupyter notebooks, `html` files and some `RMarkdown` files. Now let's navigate to the `research project` folder. In order to this we will use the `cd` command to **c**hange **d**irectory. In this case I know that the `research-project` folder is located higher up in the file structure. In order to go back one level in the directory, we can use the following command,

In [3]:
cd ..

/home/dawie/Dropbox/2022/871-data-science/DataScience-871


Once again, list the folders and files to get an idea of directory structure. 

In [4]:
ls

[0m[01;34mnotebooks[0m/  README.md  [01;34mresearch-project[0m/


We can see the folder we are looking for. However, let us go back to the **home** directory and navigate to the `research-project` folder from there. This is a good exercise that you can follow on your computer at home. We start with just typing `cd`. This will navigate us to the home directory for the current user. 

In [7]:
cd

/home/dawie


From this point I am going to navigate to my `DataScience-871` folder. From above you can see the absolute location of the folder. In my case I first need to change the directory to the `Dropbox` folder.  

In [9]:
cd Dropbox

/home/dawie/Dropbox


Now that I am in the Dropbox folder I want to go to my `2022` folder. I can do this as follows, 

In [10]:
cd 2022

/home/dawie/Dropbox/2022


Now let's take a quick look at the folders that are located in my 2022 folder. 

In [11]:
ls

[0m[01;34m318-macro[0m/  [01;34m871-data-science[0m/  [01;34m872-ats[0m/  [01;34m872-macro[0m/  [01;34mcomp-reading[0m/  [01;34mupenn[0m/


The relevant folder seems to be te `871-data-science` folder. So we change the directory once again. We keep doing this till we get to the `DataScience-871` folder. Obviously on your system this will be saved in a different location. You should know where the folder is located on your computer. We can briefly discuss this in class. There are slight differences for different operating systems. 

In [12]:
cd 871-data-science

/home/dawie/Dropbox/2022/871-data-science


In [13]:
ls

[0m[01;34mDataScience-871[0m/


In [14]:
cd DataScience-871

/home/dawie/Dropbox/2022/871-data-science/DataScience-871


In [15]:
ls

[0m[01;34mnotebooks[0m/  README.md  [01;34mresearch-project[0m/


We see that one of the folders is called `research-project`. That is where we want to navigate, so we use the `cd` command along with the folder name as follows. 

In [5]:
cd \research-project 

/home/dawie/Dropbox/2022/871-data-science/DataScience-871/research-project


You can now see that the **pwd** is the `research-project` folder. Let us explore this folder to see what it contains. 

In [6]:
ls

[0m[01;34mbin[0m/  [01;34mdata[0m/  [01;34mdocs[0m/  README.md  [01;34mresults[0m/


In this folder structure you can see a broad template for organising a small project. For this project you see that here is a `README` file that gives us the basic information of the project. You might often see `licence`, `conduct` and `citation` files in projects, but we won't be dealing with those in detail in this course. The only boilerplate file that you need to concern yourself with is the `README` document, which we will talk about a bit more when we deal with **version control**. You will notice that this has the `.md` extension, which indicates that this is a Markdown file. This entire notebook was written in Markdown and I will talk about the format briefly in the lecture.  

The directories for this project are organised by purpose. The runnable programs are located in the `bin` folder. This includes your shell scripts and R / Julia scripts. This folder is also sometimes named the `src` folder in projects. 

Our raw data goes into the `data` folder and the data in this folder is never modified. This is the original raw data. 

Results are put in the `results` folder. This includes the cleaned data, figures and other components that are created from the `bin` and `data` folder. 