# Chapter 2 Import a Dataset

## 2.1 Python Modules & Import Data From Moldules

*Modules (a.k.a. Packages)* are *Python* files that contain functions and variables. You can access these modules and make reuse of their code to solve your problem.

One advantage of the *Anaconda* distribution of *Python* is that it already comes with a number of built-in modules, so that we do not need to spend time downloading and managing these files. However, if you want to add a new package to the root environment, you can use the either *pip* or *conda* command line tool that comes with *Anaconda*. 

Open up a *Terminal* in JupyterLab by clicking the <kbd>+</kbd> button in the upper-right corner of the screen. This step is the same as creating a new NoteBook. Once the *Launcher* window is opened, find *Terminal* under the *Others* Section.

---

![Create a terminal](images/chapter2/Create_A_Terminal.png)

---

In the terminal, type `pip install modulename` or `conda install modulename` to install that module to the root environment of Anaconda. The difference between *pip* and *conda* is that they download Modules from different cloud repositories. 

Let's install the **wooldridge** package to the default environment, type the following command in the Termnial.

`pip install wooldridge` 

Now, we can import this module. Recall how to import a module.

In [1]:
import wooldridge as woo

*Don't forget to execute the code by pressing <kbd>⌃CTRL</kbd>+<kbd>return</kbd>, or <kbd>⌥option</kbd>+<kbd>return</kbd>.

Coding is never as intuitive as a graphical user interface. We do not have drop-down menus or buttons with names on it. Instead, we need to go the old-fashioned way - reading a manual (a.k.a as **API** or **documentations**). Google "python wooldridge" to find the following [wooldridge documentation](https://pypi.org/project/wooldridge/). It will instruct on how to use this Module.

In [3]:
# Here I want to show you another way to add comments to your code.
# Instead of using the Markdown mode in JupyterLab, anything after a # is treated as a comment by Ptyhon

# import dataset called 'wage1' and assign it a variable called wage1
wage1 = woo.data("wage1")

# get type of the this object
print(type(wage1))

<class 'pandas.core.frame.DataFrame'>


Python is an object-oriented programming language, which means our coding logics are based on **objects**. Object in Python is a simialr concept as the real-world object. For example, this notebook is an object, the blackboard is an object, you computer is an object. Similarly, the wage1 dataset is an object, the number "1" is an object, and the "Hello World" string is an object. We can categorize objects into different **classes**, so that objects in the same class should share some common features. The documentation of that class would document the **attributes** (properties such as name, length, size ..) and **methods** (what it can do, such as go(), turn(), move()) of all objects that belong to that class.  

We noticed that *wage1* is a pandas DataFrame (i.e. this object belongs to the DataFrame class). To see what it can do, we need to look up the [pandas DataFrame documentation](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html).

Try to locate the head() method. 

In the documentation, we find that the head method takes one *parameter* - n, and returns the same type as the caller -  a DataFrame.

---

![DataFrame.head](images/chapter2/DataFrame.head.png)

---

To use this parameter, we can either write out the full assignment `wage1.head(n=5)` or ignore the parameter name and equal sign.

In [4]:
wage1.head(5)

Unnamed: 0,wage,educ,exper,tenure,nonwhite,female,married,numdep,smsa,northcen,...,trcommpu,trade,services,profserv,profocc,clerocc,servocc,lwage,expersq,tenursq
0,3.1,11,2,0,0,1,0,2,1,0,...,0,0,0,0,0,0,0,1.131402,4,0
1,3.24,12,22,2,0,1,1,3,1,0,...,0,0,1,0,0,0,1,1.175573,484,4
2,3.0,11,2,0,0,0,0,2,0,0,...,0,1,0,0,0,0,0,1.098612,4,0
3,6.0,8,44,28,0,0,1,0,1,0,...,0,0,0,0,0,1,0,1.791759,1936,784
4,5.3,12,7,2,0,0,1,1,0,0,...,0,0,0,0,0,0,0,1.667707,49,4


This method returns the first "5" (the value of the parameter) rows of the wage1 dataset. We will continue to discuss the descriptive data analysis with pandas DataFrame in the next chapter. But now let's focus on how to load data from other types of files.

## 2.2 Import Data From Different Data Files

Common files name extensions for data files are RAW, CSV or TXT