## ECMT2160 - Econometric Analysis
### Semester 2 2018
#### Author: Chris Hyland

Welcome to your first Stata course powered by Jupyter Notebooks! This will be a quick crash course on being able to use Stata in a more formal manner. Note that I am currently writing this on the Mac operating system, so Windows users may have different experiences with Stata.

### What is Stata and why do I need to learn it?
Stata is a Data Analysis and Statistical Software widely used in the social sciences. The ease of the language and strong technical support from Statacorp are some of its biggest advantages over other programming languages. It is a good language for people who aren't familiar with programming.

As aspiring Economists/Econometricians, alot of time in your career you will be given data to analyse. Investing the time in learning tools such as Stata will go a **long** way in your career and make life much easier for you.

### Awesome! How do I get started?
When you click on the Stata icon, you should be greeted by an GUI looking like this:

![image.png](attachment:image.png)

If you see the box down below titled _Command_, this allows you to write Stata commands (duh). So let's just try entering our first command:

```Stata
display "Hi everyone!"
```
> Hi everyone!

Note the "" in the above sentence. That tells Stata you are typing in text rather than issuing more commands.

You can see that typing the command _display_ will generally display whatever you type into it. If you wanted to do some math using Stata's display console, you again need to use the display command.

```Stata
display 2+2
```
> 4

Finally, you can combine both math and words (otherwise known in programming as _strings_).

```Stata
display 2+2 " is my favourite number"
```
> 4 is my favourite number

### I think I'm getting the hang of this! But hang on, what should I do if I want to change something I typed earlier? 
Congrats and good question! This is where the something called **Do-files** are super handy. Whilst it may seen simple enough to just type a few commands into the command console, it is not ideal for writing programs that requires a longer amount of commands. If you look at the top, you should see a button called _Do-File Editor_:
![image.png](attachment:image.png)

or you can click on File->New->Do-File


You should see an empty window open up. Now try typing

```Stata
display "Hey there"
```
Now look for a button that looks like this to click on:
![image.png](attachment:image.png)

Your code should execute now! Generally, when you work with Do-files, you tend to leave write all your commands here and more importantly _comment_ or explain what is going on in your code. To write a comment, use the // symbols as that lets Stata now it is just a comment and not actually a command. Here is an example of what your Do-file may contain:

```Stata
//ECMT2160 Tutorial
//Author: Chris Hyland
display "Hi everyone"
display "It is very exciting to be teaching you ECMT2160!"
display "Yay"
```
Now when you execute this, only the last 3 lines get executed. 

Professional Economists tend to collaborate with each other by sending each other their Do-files with comments which allows their peers to run and check their code. One final bit of advice is that generally you don't tend to include the output of your code in your Do-file but for your assignments in this course, it is fine to simply copy and paste your output and then **commenting it out**. 

### That's awesome! How do I read in a dataset that I would like to analyse?

This part may be a tricky initially but bear with me! If you want to be able to load in a dataset via commands rather than clicking on import, first you need to find the path of the file. The path of the file is just where is the file on your computer. Here's how to find the path of a file for [Windows](https://www.pcworld.com/article/251406/windows_tips_copy_a_file_path_show_or_hide_extensions.html) and for [Mac](http://osxdaily.com/2013/06/19/copy-file-folder-path-mac-os-x/). 

Here is an example of me specifying the path to my Desktop and reading in a file.


```Stata
// Specifies the path to the desktop and save the path as datadir
global datadir "/Users/ChristopherHyland/Desktop"

// Read in the data from my desktop
use "$datadir/smoke.dta", clear
```


That should have read in the Smoking (smoke) dataset. Note that this process is easier if it is a .dta file rather than a csv file.

### Alright, I managed to read the data in. What now?

Alot of the time when you first load in your dataset, it is a good idea to do some general exploring of your dataset. A good start is to compute basic summary statistics of the dataset. To do so, just go

```Stata
summarize
```

![image.png](attachment:image.png)

### Awesome! Would a Frequency table be a good idea too?
Good point! To create a frequency table, we can use the tabulate command. Let's compute a frequency table for the education variable.

```Stata
tabulate educ
```

![image.png](attachment:image.png)

### All these tables are boring, could I explore the data in a more visual manner?

A scatterplot is a great tool for analysing your data. To create a scatterplot, enter:

```Stata
scatter y x
```
where y is what you want on the y-axis whilst x is what variable you would like on the x-axis. Here is an example:

```Stata
scatter cigpric educ
```
![image.png](attachment:image.png)

### Great! What now? 

With all that out of the way, we can begin with some **modelling**! Let's begin with **regressions**.

To do a simple linear regression (after checking whether the assumptions hold), we can use the reg command, which stands for regress. We always let the _dependent variable_ be the variable after the reg and any independent variables we want to be after it.

```Stata
reg cigs age
```

![image.png](attachment:image.png)

Let's interpret this!

1) \_cons is the intercept. The coef column tells us the value of the coefficients on those variables. Here, age's coefficient is -0.033 with a standard error of 0.02838. 

2) The t-column tells us the test statistic for our hypothesis of $\beta_1 = 0$ or that the coefficient is equal to 0. The P>|t| column tells us the p-value from such a test and last 2 columns are the confidence intervals.

3) We can see other useful information such as the R-squared and adjusted R-squared in the top right hand side. The F(1, 805) is the F-test on all the coefficients being equal to 0 or not. Note that this is the same as the t-test mentioned in the last point since we only have 1 independant variable in our model. 

4) The top left tells us more information such as the [total sum of squares](https://en.wikipedia.org/wiki/Total_sum_of_squares), [explained sum of squares](https://en.wikipedia.org/wiki/Explained_sum_of_squares) and the [residual sum of squares](https://en.wikipedia.org/wiki/Residual_sum_of_squares) which you should be familiar with from earlier ECMT courses.

To do a multiple linear regression, we can go:

```Stata
reg cigs age educ income
```
![image.png](attachment:image.png)

### I see! I notice there seems to be a recurring pattern in Stata's command? 

Yes you're right! Once the format of 

```Stata
model y x
``` 

becomes familiar to you, it's easy to pick up alot of other models. An example is the logistic regression which is just:

```Stata
logit restaurn educ age
```

![image.png](attachment:image.png)

You may notice the few lines at the beginning talking about iterations and log likelihood. Don't worry about that for now but in a nutshell, Stata is using [numerical analysis](https://en.wikipedia.org/wiki/Numerical_analysis) to find the values in its [maximum likelihood estimation](https://en.wikipedia.org/wiki/Maximum_likelihood_estimation). We'll learn more about this later on in the course!

### How intuitive!

Exactly! Hence why Stata is still the language of choice for many researchers. That's all the time we had for today but if you want to look at more things, always feel free to read the [user guide](https://www.stata.com/manuals13/u.pdf) issued by Stata. It's a fanastic bedtime read!