## Selecting columns 

We begin by importing pandas

In [2]:
import pandas as pd

To begin, we will create a table to work from. In SQL, we create a table, and then insert values we want into the table. 

```sql
    CREATE TABLE #ListOfNames
    (
		ID int,
		FullName varchar(20)
	)
	INSERT INTO #ListOfNames (ID, FullName)
	VALUES 
		(1, 'Jerry'),
		(2, 'Terry'),
		(3, 'Mary'),
		(4, 'Larry')
```

Unlike in SQL we don't have separate statements for creating the data frame and inserting values. We will do it all in one go with the DataFrame class. The great thing about pandas is how flexible it can be with inputs. It can take lists, dicts, imports from csvs, etc. However that flexibility comes with the cost. It may be difficult to choose the method. 

For scraping together a data frame for quick testing providing the rows as a list of lists makes sense. To do that we can take our sql and slightly adjust the parentheses and brackets.

```sql 
(1, 'Jerry') ```
Becomes 

```python
[1, 'Jerry']
```

We encapsulate all the rows within one list and call it our data
```python
data =[[1, 'Jerry'],[2, 'Terry'],[3, 'Mary'],[4, 'Larry']]
```

This list of lists, becomes our rows, and we just need to define the columns. Again using a list, calling it our columns
```python
columns = ['ID','FullName']
```

Now all we have to do is pass this into a DataFrame


In [4]:
#Creating a data frame 
df = pd.DataFrame(data =[[1, 'Jerry'],[2, 'Terry'],[3, 'Mary'],[4, 'Larry']], columns=['ID','FullName'])

Now that we have a dataframe we can start experementing with different ways to select data from it. So given our dataframe, df, how do we select *?
```sql
Select * from df
```

The way we do that is by just writing the name of the dataframe and executing it

In [5]:
#The Select * of Pandas, just write the name
df

Unnamed: 0,ID,FullName
0,1,Jerry
1,2,Terry
2,3,Mary
3,4,Larry


Of course the number of rows actually displayed depend on some options that you can tweak in pandas. We will talk about limitting rows in a bit. 

One thing to note is that unlike SQL table a pandas DataFrame automatically has an index (the unlabeled column starting with 0). We will talk about indexes more after the basics have been covered.

Often times in SQL we will have a large table with 10s or 100s of columns and we are only interested in a handful of them. So naturally we limit the columns we show in select statements by being explicit and not using Select *. 

For this example let's create a table with more columns

In [6]:
df = pd.DataFrame(data = [['Jerry', 45, 0, 'gmail', 'Tim'],
                          ['Terry', 5, 1, 'gmail', 'Buddy'],
                          ['Mary', 30, 1, 'yahoo', 'Cooper'],
                          ['Larry', 45, 0, 'MSN', 'Buddy'],
                          ['Gary', 10, 1, 'MSN', 'Slappy'],
                          ['Larry', 45, 0, 'MSN', 'Buddy']
                         ], 
                  columns = ['Person', 'Age', 'HappyFlag', 'Email', 'DogName'])

In [7]:
df

Unnamed: 0,Person,Age,HappyFlag,Email,DogName
0,Jerry,45,0,gmail,Tim
1,Terry,5,1,gmail,Buddy
2,Mary,30,1,yahoo,Cooper
3,Larry,45,0,MSN,Buddy
4,Gary,10,1,MSN,Slappy
5,Larry,45,0,MSN,Buddy


We have a couple options for selecting one particular column. 
For example age
```sql
Select Age from df
```
One option is we could use the following syntax DataFrame.Column, note that even when selecting 1 column our index is returned as well.

In [8]:
df.Age

0    45
1     5
2    30
3    45
4    10
5    45
Name: Age, dtype: int64

The other option is preferable and we use brackets. They are known as an **indexer** (along with .loc and .iloc) and we are allowed to put various things inside the indexer that help us subset our data, including the name of a column:

In [9]:
df['Age']

0    45
1     5
2    30
3    45
4    10
5    45
Name: Age, dtype: int64

At this point it should be noted that the way in which we are selecting a single column is returning a different object. Meaning what is being displayed as output is technicallyno longer a Data Frame. Note below, we can check the type of objecting being returned by using the type() function:

In [11]:
#Our DataFrame has type DataFrame
type(df)

pandas.core.frame.DataFrame

In [13]:
#When we select the age column in the way demonstrated above, we get a Series
type(df['Age'])

pandas.core.series.Series

There is a way to get a data frame returned and select just a single column, but it follows naturally after discussing selecting multiple columns. So lets select 3 of 5 columns in our table 
```SQL
Select Person, Age, Email from df
```
For this, we can simple pass a list of columns we want returned to our indexer, the list is
```python
['Person', 'Age', 'Email']
```
and we will put that list inside our brackets (thus creating double brackets) as follows:

In [14]:
#note the double [[ We are passing a list ['Person', 'Age', 'Email'] into df[]
df[['Person', 'Age', 'Email']]

Unnamed: 0,Person,Age,Email
0,Jerry,45,gmail
1,Terry,5,gmail
2,Mary,30,yahoo
3,Larry,45,MSN
4,Gary,10,MSN
5,Larry,45,MSN


You can tell by the look and the feel of the output that this is a dataframe, but just to confirm:

In [15]:
type(df[['Person', 'Age', 'Email']])

pandas.core.frame.DataFrame

As you may have guessed passing a list into our indexer has made it return a Dataframe. We can force a single column select statement to return a dataframe by adjusting 
```python
df['Age']
```
All we need to do is pass a list instead of a single string.

In [17]:
#Forcing selecting a single column to return a dataframe
df[['Age']]

Unnamed: 0,Age
0,45
1,5
2,30
3,45
4,10
5,45
