# Setup

## Install Python

Anaconda, an all-in-one installer, is recommended for data science.<br>
https://www.anaconda.com/distribution/

For a more lightweight installation, you can use a package manager like homerew for Mac, or Chocolatey for Windows.  See a guide here: https://docs.python-guide.org/starting/installation/.

Regardless of how you choose to install it, please make sure you install Python version 3 (eg 3.7).

## Setting up a "virtual environment"

In the course of working with data or web development, we will have our work grouped as different projects for different purposes.  Since each project has different needs, they would need different tools.

The first step in starting a project in Python is to set up a "virtual environment".  This is a fancy term that basically means, for each project we work on, we want to set up a different box -- or environment -- to run the project on.  We want to do this because different projects to use different tools.  Each box would have just the tools needed for that project.

If we use one box for all our projects, eventually, we'll have too many tools in one box and the tools may be in conflict with one another.

More common terms we'll hear for tools in a programming project are "dependencies", "packages", "libraries", or "modules".  Technically, these refer to slightly different things, but they are used interchangeably to mean: bundles of code organized and coded for specific purposes, such as the "random" module for generating random numbers in all sorts of ways.  Often, these dependencies, packages, libraries, or modules are made by other people.

## Best practices: setting up venv

https://packaging.python.org/tutorials/installing-packages/#creating-virtual-environments

## Let's google vertual environment

virtualenv docs<br>
https://virtualenv.pypa.io/en/latest/
<br><br>
virtualenv user guide<br>
https://virtualenv.pypa.io/en/latest/userguide/

## Install packages

pandas docs<br>
http://pandas.pydata.org/

matplotlib docs<br>
https://matplotlib.org/

jupyter docs<br>
https://jupyter.readthedocs.io/en/latest/index.html

Options for how to set up virtual environments:

### virtualenv

virtualenv needs to be installed separately, but supports Python 2.7+ and Python 3.3+

### pipenv

Learn more about pipenv here: https://docs.python-guide.org/dev/virtualenvs/

### conda envs

# Data science best practices with pandas

GitHub repository https://github.com/codeparkhouston/intro-to-data-with-python

If you want to directly download only the CSV file, right click on the following link and select "Save As": ted.csv.

Pandas documentations: http://pandas.pydata.org/pandas-docs/stable/

## The pandas library is a powerful tool for multiple phases of the data science workflow.

* data cleaning 
* visualization
* exploratory data analysis

### However, the size and complexity of the pandas library makes it challenging <br>to discover the best way to accomplish any given task.

### More public data can be found:

https://github.com/mwaskom/seaborn-data
    
https://github.com/rfordatascience/tidytuesday/tree/master/data

### Check that pandas and matplotlib are properly installed

In [None]:
# If you're using the Jupyter notebook, run the following code:

import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
df = pd.read_csv('../data/ted.csv')
df.comments.plot()

In [None]:
# If you're using any other Python environment other than Jupyter notebook, run the following code:

import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('../data/ted.csv')
df.comments.plot()
plt.show()