# Week 1: Course overview

This course is a practical exploration of environmental data and analysis. The **objective** of this first lecture is to introduce both the course structure and the way we will work with data and code.  We will rely exclusively on Jupyter Notebooks to edit and execute Python code. I encourage you to follow along in lecture in your own Notebook.

You can, if you want, [run the Notebook from your locally](https://docs.jupyter.org/en/latest/running.html).  But don't worry about it!!  We will use [Colab](https://colab.research.google.com/) (Note: you need to have a Gmail account to use it) to run our Notebook, our narrative code. Just to ensure you're up and running, execute this code in a live Notebook.

In [None]:
print("hello world")

hello world


#Why Python for Data Analysis?

Since its first appearance in 1991, Python has become one of the most popular **interpreted (vs. compiled)** programming languages, along with Perl, Ruby, and others.

In the last 20 years, Python has gone from a bleeding-edge or "at your own risk" scientific computing language to one of the most important languages for data science, machine learning, and general software development in academia and industry.

**Two-language problem** ---

In many organizations, it is common to research, prototype, and test new ideas using a more specialized computing language like SAS or R and then later port those ideas to be part of a larger production system written in, say, Java, C#, or C++. What people are increasingly finding is that Python is a suitable language not only for doing research and prototyping but also for building the production systems.

#Variables and types

Basic variable types in Python consist of strings and numeric types. Let's look at both of
these types in this section.

##Strings

In Python, a **string** is a variable type that stores text characters such as letters, numbers, special characters, and punctuation. In Python, we use single or double quotation marks to
indicate that the variable is a string rather than a number:

In [None]:
var = 'Hello, World!'
print(var)

Hello, World!


Strings cannot be used for mathematical operations on numbers. But they can be used for
other useful operations, as we see in the following example:

In [None]:
string_1 = '1'
string_2 = '2'
string_sum = string_1 + string_2
print(string_sum)

12


The result of the preceding code is to print string '12', not '3'. Instead of adding the two
numbers, the + operator performs **concatenation** (appending the second string to the end of the first string) in Python when operating on two strings.

Other operators that act on strings include the \* operator (for repeating strings number of times, for example, string_1 \* 3) and the < and > operators (to compare the ASCII values of the strings).

To convert data from a numeric type to a string, we can use the **str()** method.

Because strings are sequences (of characters), we can index them and slice them (like we can do with other data containers, as you will see later). A slice is a contiguous section of a string. To index/slice them, we use integers enclosed in square brackets to indicate the
character's position:

In [None]:
test_string = 'Environment Economics'
print(test_string[0])

E


Python is zero-indexed, so the first index is `0` and the second is `1` and so on.

To slice strings, we include the beginning and end positions, separated by a colon, in the square brackets. Note that the end position will include all the characters up to but not including the end position, as we see in the following example:

In [None]:
print(test_string[0:6])

Enviro


Earlier, we mentioned the **str()** method. There are dozens of other methods for strings. A full list of them is available in the online Python documentation at www.python.org.

Methods include those for case conversion, finding specific substrings, and stripping whitespace. We'll discuss one more method here-the **split()** method. The split() method acts on a string and takes a *separator* argument.

The output is a list of strings; each item in the list is a component of the original string, split
by separator. This is very useful for parsing strings that are delimited by punctuation
characters such as , or ; . We will discuss lists shortly. Here is an example of the
split() method:

In [None]:
test_split_string = 'Jones,Bill,49,Atlanta,GA,12345'
output = test_split_string.split(',')
print(output)

['Jones', 'Bill', '49', 'Atlanta', 'GA', '12345']


##Numeric types

The two numeric types in Python that are most useful for analytics are **integers** and **floating-point numbers**. To convert to these types, you can use the **int()** and **float()**
functions, respectively. The most common operations on numbers are supported with the usual operators: +, -, *, /, <, and >. Modules containing special methods for numeric types
that are particularly useful for analytics include **math** and **random**.

##The Boolean type
is a special integer type that can be used to represent the *True* and
*False* values. To convert an integer to a Boolean type, you can use the **bool()** function. A
zero gets converted to *False*; any other integer would get converted to *True*. Boolean variables behave like 1 (True) and 0 (False), except that they return *True* and *False*, respectively, when converted to strings.

# References

1. Vikas (Vik) Kumar, Healthcare Analytics Made Simple: Techniques in healthcare computing using machine learning and Python, Copyright © 2018 Packt Publishing

2. McKinney, W. (2022) Python for Data Analysis: Data Wrangling with pandas, NumPy, and Jupyter, 3rd Edition. O’Reilly Media, Inc. open access: https://wesmckinney.com/book/