# Tech #5 Selecting, Filtering, and Creating Data

- This file illustrates how to select, filter, and create data.

**Import pandas package**

In [None]:
import pandas as pd

**Load the dataset**

In [None]:
df19 = pd.read_csv('Compustat_fy2019.csv', parse_dates = ['datadate'])

**Navigate the dataset**

In [None]:
df19.head()

---
## Selecting Data

### Selecting one variable (column)
```Python
df19['at']
```
- Specify the variable name with quotes around it in the brackets.

### Selecting multiple variables (columns)
```Python
df19[['conm','at']]
```
- Specify a list of variable names (using brackets) in the brackets.

**Note: These steps are executed only to give you the results temporarily. They are not performed on the df19 dataset at all.**
If you want to save what you select as a new dataset, you should assign it a new reference as the following code shows.
```Python
df19_1 = df19[['conm','at']]
```

### Selecting observations (rows)
```Python
df19[0:10]
```
- Selecting from the first observation to the tenth observation.
- `df[a:b]` will give you observations with index between a and **b-1**.
- If you don't specify the number, the default is either zero in position `a` or the largest index in postion `b`. So `df[0:10]` is equivalent to `df[:10]`.

### Selecting both observations and variables at the same time
```Python
df19[0:10][['conm','at']]
```

---
## Filtering Data

### Filtering data on one condition
```Python
df19[df19['exchg']==14]
```
- Selecting companies that are listed in NASDAQ by specifying the corresponding condition (i.e., `df['exchg']==14`) in the brackets.
- Python comparison operators
    - Equal (==)
    - Not equal (!= or <>)
    - Greater (>)
    - Greater or equal (>=)
    - Smaller (<)
    - Smaller or equal (<=)

### Filtering data on multiple conditions
```Python
df19[(df19['exchg']==14) & (df19['at']>=5000)]
```
- Selecting companies that are listed in NASDAQ and have at least 5 billion total equities.
- You need to put parentheses around each condition, and then connect these conditions with bitwise operators. 
- Python bitwise operators
    - And (&)
    - Or ( | )

- You can also write a new code to specify the condition separately and then refer to it in the function
```Python
condition = (df19['exchg']==14) & (df19['at']>=5000)
df19[condition]
```


---
## Creating Data

### Creating a new variable
```Python
df19['roa'] = df19['ni'] / df19['at']
```
- Creating a new variable called roa by dividing net incomes with total assets.

---
## Python group project due on Friday 2/11 @ 8AM
Please create a **new Jupyter Notebook file** and then perform the following analyses using Python code and the **2018** compustat file. For every question, please write down the code and execute them to get results. Please also **explain your findings by using "commenting" or by creating a Markdown cells right after the returning results** (you can read the tips below on how to create markdown cells).

After you finish the analyses, please **make sure every cell is executed and all the returning results are presented**. Then you can save the file and submit it to Canvas. **Each group only has to submit one file but everyone should make sure they are able to write and run the code.**
- Describe the dataset, *Compustat_fy2018.csv*
    - How many observations are there?
    - How many variables are there? What are those variables?
- Find out the SIC industry code for General Motors Co. (tic == 'GM')
- Did GM outperform its industry **average** in fiscal year 2018 on the following two measures?
    - ROE (net income /  total equity)
    - Asset turnover ratio (revenue / total assets)

### Tips: 
### How to create and write in Markdown cells
- Create a new cell by clicking on the plus icon in the menu (second icon next to the "save" one).
- Select "Markdown" instead of "Code" in the drop-down menu next to the last icon (a small keyboard).
- You can type any of your explanatory texts in the markdown cells. After you finish typing, you can run the cell to have the texts presented.
- Markdown is a lightweight coding language that you can use to create rich texts with different formats and sizes. If you want to know the basic language structure, you can go to this [website](https://commonmark.org/help/).

### Command keyboard shortcuts
- There are keyboard shortcuts for major commands, such as creating a new cell, entering the edit mode, and running the cell. You can check those shortcuts by clicking on the small keyboard icon on the far right of the menu.