#Working with data in the command line

If you haven't used the command line much, you will this summer. The command line is the primary interface to our Amazon computers, and it comes with many tools that can help you quickly and reliably analyze data. In this module, we will cover some of the most commonly used command-line tools. 

##Goals
- Learn commonly used command-line tools
- Learn how to use them in conjunction


##Tasks
- Navigate to directory where your data-portal files reside
- Look at the top and bottom of the dataset 
- Generate summary statistics for the building-permits dataset
- Search and temporarily replace all the dollar signs in the permits data
- Write a test bash script

##Tools
- ls
- less
- cat
- head
- tail
- csvkit (the commands we look at are csvlook, csvstat, and csvsql)
- grep
- wc
- tr

##What data do we have?
Navigate to the directory with the data and list files
```
cd
ls -lha #hidden files and directories
```

##Look at the data
Look at the innards of the files. 
- `cat` outputs everything.
- `head` looks at the first ten rows.
- `tail` looks at the last ten rows.
- `less` gives you a page at a time. The S option helps make it pretty.

```
cat building_permits.csv
cat building_permits.csv | less -S

head building_permits.csv
tail building_permits.csv
```

##csvkit
[csvkit](http://csvkit.readthedocs.org/) helps you understand and manipulate data files at the command line. 
```
pip install csvkit
csvstat building_permits.csv
csvlook building_permits.csv
```

##Pipe order can matter
In the first, csvlook has to load the entire dataset before completing. In the second, csvlook only has to read the first ten lines of the file. The second approach is faster.
```
csvlook building_permits.csv | head | less -S
head building_permits.csv | csvlook | less -S
```

##Grab rows with relevant info
```
grep "PORCH" building_permits.csv | head | csvlook | less -S
```

##Count lines
```
cat building_permits.csv | wc -l
grep "PORCH" building_permits.csv | wc -l
```

##Search and replace
```
head building_permits.csv | csvlook | less -S 
head building_permits.csv | sed 's/\$//g' | csvlook | less -S
head building_permits.csv | sed 's/\$//g' | tr [:upper:] [:lower:] | csvlook | less -S
```

##Scripts
We can write scripts for the command line. The first line, the "shebang," allows you to choose which interpreter to use for your script (you could put python or numerous other options here, but `#1/bin/bash` is the most common.) In this script, we remove all dollar signs, change from uppercase to lowercase, and print the first ten rows to `csvlook` and `less`: 
```
#!/bin/bash
head building_permits.csv | sed 's/\$//g' | tr [:upper:] [:lower:] | csvlook | less -S
```

You will need to change the permissions on the file (i.e. who can read, write, or execute the file):

```
chmod +x test_script.sh
```

Then run it:

```
bash test_script.sh
```

##Help
```
man psql
psql --help
```