In [2]:
import pandas as pd

# Pandas

In [100]:
users = pd.read_table('./data/user.tbl', sep='|')
ufo = pd.read_csv('./data/ufo.csv')

### Apply

In [134]:
users.head()

users["retireable"] = users["age"].apply(lambda x: True if int(x) > 65 else False)
# users["retireable"] if users["retireable"] == True

users.retireable.value_counts()
users.loc[:, "retireable"].value_counts()
users[users["age"]>65]


Unnamed: 0,user_id,age,gender,occupation,zip_code,retireable
210,211,66,M,salesman,32605,True
348,349,68,M,retired,61455,True
480,481,73,M,retired,37771,True
558,559,69,M,executive,10022,True
572,573,68,M,retired,48911,True
584,585,69,M,librarian,98501,True
766,767,70,M,engineer,0,True
802,803,70,M,administrator,78212,True
859,860,70,F,retired,48322,True


In [58]:
ufo.head()


Unnamed: 0,City,Colors Reported,Shape Reported,State,Time
0,Ithaca,,TRIANGLE,NY,6/1/1930 22:00
1,Willingboro,,OTHER,NJ,6/30/1930 20:00
2,Holyoke,,OVAL,CO,2/15/1931 14:00
3,Abilene,,DISK,KS,6/1/1931 13:00
4,New York Worlds Fair,,LIGHT,NY,4/18/1933 19:00


In [31]:
# apply an arbitrary function to each value of a Pandas column,
# storing the result in a new column



# ufo["Shape Reported"].value_counts()
# ufo["CityState"] = ufo.apply(ufo.loc["City"] = ufo.loc["State'"])
# ufo["CityState"] = ufo.loc[:, ["City"]]
# ufo.head['City_State'] = ufo.loc[["City", "State"]]

Unnamed: 0,City,Colors Reported,Shape Reported,State,Time
0,Ithaca,,TRIANGLE,NY,6/1/1930 22:00
1,Willingboro,,OTHER,NJ,6/30/1930 20:00
2,Holyoke,,OVAL,CO,2/15/1931 14:00
3,Abilene,,DISK,KS,6/1/1931 13:00
4,New York Worlds Fair,,LIGHT,NY,4/18/1933 19:00


In [None]:
# apply an arbitrary function to each row of a DataFrame,
# storing the result in a new column
ufo["State"]

### String Methods

In [None]:
# Use string methods to change State abbreviations in ufo data to uppercase


In [None]:
# Get a Boolean series that indicates which elements of ufo
# "Colors Reported" column contain the substring "RED"


### Datetimes

In [None]:
# convert a string to the datetime format


In [None]:
# Get ufo Time column in hours


In [None]:
# Get number of days spanned by events in ufo dataset


### Changing How DataFrames Are Displayed

In [None]:
# change the maximum number of rows and columns printed


# Python Style

**Warning:** Everybody likes clean, consistent code; but nobody likes a legalistic zealot.

### PEP8

[PEP8](https://www.python.org/dev/peps/pep-0008/) is the standard style guide for Python.

- Use 4-space indentation
- Keep line lengths below 80 characters.
    - Prefer splitting inside (), [], {} instead of using \\.
    - Put operator at the start of the new line instead of the end of the old line.

Yes:

```python
my_new_variable = (old_variable1
                   * old_variable2
                   + old_variable3
                   - old_variable4
                   - old_variable5)

my_new_variable = (
    old_variable1 * old_variable2 + old_variable3 - old_variable4 - old_variable5
    )

my_new_variable = (old_variable1 * old_variable2 + old_variable3
                   - old_variable4 - old_variable5)
```

No:

```python
my_new_variable = old_variable1 * old_variable2 + old_variable3 - old_variable4  - old_variable5

my_new_variable = old_variable1\
                  * old_variable2\
                  + old_variable3\
                  - old_variable4\
                  - old_variable5
```

- Creating strings with either single or double-quotes is fine. If your string contains quotes of one type, use the other type to create it to that you don't have to use backslashes in the strings.

Yes:

```python
'dog'
"cat"
"can't"
'She said "run!"'
```

No:

```python
'can\'t'
"She said \"run!\""
```

- [Whitespace rules](https://www.python.org/dev/peps/pep-0008/#id27)
- Always surround these binary operators with a single space on either side: assignment (=), augmented assignment (+=, -= etc.), comparisons (==, <, >, !=, <>, <=, >=, in, not in, is, is not), Booleans (and, or, not).
- Don't use spaces around the = sign when used to indicate a keyword argument or a default parameter value.

Yes:
```python
def complex(real, imag=0.0):
    return magic(r=real, i=imag)
```

No:
```python
def complex(real, imag = 0.0):
    return magic(r = real, i = imag)
```

### LowClass Python

[This style guide](http://columbia-applied-data-science.github.io/pages/lowclass-python-style-guide.html) is meant for data scientists and others who write code but aren't exactly professional programmers.

- Write functions that take well-defined inputs and produce well-defined output.
- Do not have multiple levels of nesting within a function. As soon as you drop down to a lower level of abstraction, create a helper function.

```python
def extract_feature_counts(data_string):
    """
    Some docstring here...
    """
    cleaned_data_string = _clean(data_string)
    word_counts = _count_words(cleaned_data_string)

    return word_counts


def _clean(data_string):
    # Some code here.
    return cleaned_data_string


def _count_words(data_string):
    # Some code here.
    return word_counts
```

**Notice:**

- A leading underscore in a function name indicates that the function is "private," meaning that you don't intend for anyone else to use it.

### Additional Resources

- [Google Style Guide](http://google.github.io/styleguide/pyguide.html)
- [Clean Code](https://www.amazon.com/Clean-Code-Handbook-Software-Craftsmanship/dp/0132350882)
- [The Pragmatic Programmer](https://www.amazon.com/Pragmatic-Programmer-Journeyman-Master/dp/020161622X/ref=pd_sim_14_3?_encoding=UTF8&pd_rd_i=020161622X&pd_rd_r=KAF4G8CGK7T9PFT998E2&pd_rd_w=u0wda&pd_rd_wg=KHmIe&psc=1&refRID=KAF4G8CGK7T9PFT998E2)

# Jupyter Notebooks, REPLs, Text Editors, and IDEs

### REPLs

- REPL = Read-Evaluate-Print Loop.



### Jupyter Notebooks

- Jupyter Notebooks are browser-based frontends to REPL sessions (using the `iPython` kernel for Python).
- Ability to mix text and media with code is powerful for documentation and reporting.
- Ability to run cells out of order is a double-edged sword.
    - Great for exploration and presentation.
    - Doesn't enforce reproducibility.

### Text Editors


##### Options

- **Too simple:** Notepad, TextEdit, Nano



- **Powerful but demanding:** Emacs, Vim
    


- **Approachable and moderately powerful:** Sublime Text, Atom



### IDEs

##### Options

- Rodeo (similar to RStudio)
- Spyder (similar to Matlab)
- PyCharm (aimed at Python developers)



##### Opinionated Advice

- **For this class:** Use Jupyter.
- **For quick one-off tasks:** Use `ipython`.
- **When writing code to run repeatedly:** Move to Atom.
- **When building Python packages:** Move to PyCharm.
- **At some point:**
    - Learn enough Vim to not embarrass yourself. If you like it, enable Vim keybindings elsewhere.
    - Try out Emacs, Rodeo, and Spyder.

# Practice

Work on `lesson07_exploratory_data_analysis/practice/eda-data_cleaning_intro-lab-master/pandas-cleaning-apply.ipynb` in pairs using the driver/navigator approach. One person (the "driver") writes the code (sharing his or her screen) while the other person (the "navigator") continually makes suggestions and reviews the code. The driver should talk about what he or she is doing, ask for input, and generally keep the navigator engaged.

Many professional programmers swear by this "pair programming" approach to software development.

After half of our time is up, I will have you switch roles and work on `lesson06_experiments_and_hypothesis_testing/practice/eda-telecomm_group_project-lab-master/telecomm-eda-group-lab.ipynb`.

If you have already completed more than a little bit of one of these notebooks, let me know and we will figure out something else for you to work on.

# Unit Project 2

https://git.generalassemb.ly/datr1618/unit_project2

# Questions?

# Exit Ticket

```
=========================================
@channel
Exit Ticket: https://goo.gl/forms/OUw4gyTiRKMOTI3t2        

#feedback
=========================================
```