#**Data Wrangling**

Data wrangling can be defined as the process of cleaning, organizing, and transforming raw data into the desired format for analysts to use for prompt decision-making. Also known as data cleaning or data munging, data wrangling enables businesses to tackle more complex data in less time, produce more accurate results, and make better decisions. 

Some examples of data wrangling include:

1. Merging multiple data sources into a single dataset for analysis.
2. Identifying gaps in data (for example, empty cells in a spreadsheet) and either filling or deleting them.
3. Deleting data that’s either unnecessary or irrelevant to the project you’re working on.
4. Identifying extreme outliers in data and either explaining the discrepancies or removing them so that analysis can take place.

In this course we're going to understand how to perform *Data Wrangling* in python. But before we proceed further, let us revisit some basic of Python to brush up our coding skills.

##**1. Introduction to Basic Python**

###1.1 Basic Datatypes

In this section, we will go over simple data types in Python. These are some of the
essential building blocks for handling information in Python. The data types we will
learn are strings, integers, floats, and other non–whole number types.

####**1.1.1 Strings**

A *string* is basically text and it is denoted by
using quotes (Single or double). Strings can contain numbers, letters, and symbols. Essentially, in Python, strings are arrays of bytes representing Unicode characters.

An example of a String is:

*'cat'*

*'This is a string.'*

*'5'*

*'walking'*

*'$GOObarBaz340 '*

In [None]:
'An example of a string'

'An example of a string'

**Assigning & Printing a String**

In [None]:
x = 'hello' #Assigning a string to a variable.

In [None]:
x

'hello'

In [None]:
print(x)

hello


Another example of printing:

In [None]:
num = '12'
name = 'Sam'

In [None]:
#Format 1

print('My number is: {one}, and my name is: {two}'.format(one=num,two=name))

My number is: 12, and my name is: Sam


In [None]:
#Format 2

print('My number is: {}, and my name is: {}'.format(num,name))

My number is: 12, and my name is: Sam


**Using split() in Python**

If you look at the string **'cat,dog,horse'**, it looks like it is a list saved in a string. It’s actually a single value, but with the Python string’s built-in split method we can divide the string into smaller pieces by splitting it on the comma character, like so:


In [None]:
'cat,dog,horse'.split(',')

['cat', 'dog', 'horse']

We can split a string against any separator which needs to be specified within the brackets.

####**1.1.2 Integer**



An *integer* is a whole number. 

Example: 5, 0, -10, 100, -9999, 25842

If you enter those into your Python interpreter, the interpreter will return them back to you.

Notice in the string example in the previous section, we had a '5'. If a number is entered within quotes, Python will process the value as a string. In the following
example, the first value and second value are not equal:

To test this, enter the following into your interpreter:



In [None]:
5 == '5'


False

In the previous statement, we asked Python whether 5 the integer was the same as '5' the string. What did Python return? **False.**

Basically, the numbers stored as Strings are characters and you cannot peroform calculations on characters. We need numbers for that. 

####**1.1.3 Floats**

A *float* value is a decimal value.

When a non–whole number is used in Python, it defaults to turning the value into a float. A float uses the built-in floating-point data type for your Python version. This means Python stores an approximation of the numeric value—an approximation that reflects only a certain level of precision.

Notice the difference between the following two numbers when you enter them into
your Python interpreter:

In [None]:
2
2.0

####**1.1.4 Basic Operations on Numbers**

Now, let's see some basic operations that can be performed on Integers.

In [None]:
#Addition

x = 10 + 56
y = 0.5 + 9.9

print(x)
print(y)

66
10.4


In [None]:
#Subtraction

x = 23 - 47
y = 81.9 - 66.75

print(x)
print(y)

-24
15.150000000000006


In [None]:
#Multiplication

x = 61 * 10
y = 9.65 * 0.25

print(x)
print(y)

610
2.4125


In [None]:
#Division

x = 100 / 25
y = 100.0/12.5

print(x)
print(y)

4.0
8.0


In [None]:
#Modulus for Remainder

x = 30 % 10
y = 999.99 % 2.50

print(x)
print(y)

0
2.490000000000009


In [None]:
#Power Indices

x = 5 ** 3
y = 17.5 ** 0.1

print(x)
print(y)

125
1.331385445092909


In [None]:
#Brackets for BODMAS

x = (2 + 3) * (5 + 2.5)

print(x)

37.5


####**1.1.5 List**



A list is a group of values that have some relationship in common. You use a list in Python similarly to how you would use it in normal language. In Python, you can
create a list of items by placing them within square brackets([]) and separating them with commas.

Let’s make a list of groceries in Python:

In [None]:
['milk', 'lettuce', 'eggs']

['milk', 'lettuce', 'eggs']

You can make lists of any Python data type, or any mixture of data types (i.e., floats and strings).

In [None]:
['hi',1,1,2.5]

['hi', 1, 1, 2.5]

You can also create lists of lists. Let’s say we have a list of names for our animals:

In [None]:
cat_names = ['Walter', 'Ra']
dog_names = ['Joker', 'Simon', 'Ellie', 'Lishka', 'Fido']
horse_names = ['Mr. Ed']
animal_names = [cat_names, dog_names, horse_names]

In [None]:
print(animal_names)

[['Walter', 'Ra'], ['Joker', 'Simon', 'Ellie', 'Lishka', 'Fido'], ['Mr. Ed']]


####**1.1.6 Dictionary**


A *dictionary* is a collection of data which is ordered*, changeable and do not allow duplicates. 

Unlike other data types that hold only a single value as an element, Dictionary holds key:value pair. Key-value is provided in the dictionary to make it more optimized. 

In [None]:
# Creating a dictionary with integer keys

Dict = {1: 'Geeks', 2: 'For', 3: 'Geeks'}
print(Dict)

{1: 'Geeks', 2: 'For', 3: 'Geeks'}


We can also access individual elements of the dictionary via their keys.

In [None]:
print(Dict[2])

For


####**1.1.7 Indexing in Python**

Since we have learnt about the various data types, let us move on to learning how to index & slice them to retrieve our relevant data.

**List**



In [None]:
my_list = ['a', 'b', 'c', 'd'] 

#Indexing starts from 0

In [None]:
my_list[1]

'b'

In [None]:
my_list[1:]

#Prints from the X:Y with X being included and Y excluded.

['b', 'c', 'd']

In [None]:
nest = [1,2,3,[4,5,['target']]]

In [None]:
nest[3][2][0]

#Retrieving data from a nested List

'target'

**String**

In [None]:
s = "Hey there! I'm learning Python!"

In [None]:
s[0:3]

#Extracting Hey

'Hey'

In [None]:
s.split()

['Hey', 'there!', "I'm", 'learning', 'Python!']

**Dictionary**

In [None]:
d = {'key1':'item1','key2':'item2'}

In [None]:
d['key2']

'item2'

####**1.1.8 What can various Data Types do?**

Each of the basic data types can do a variety of things. Here is a list of the data types
we’ve learned about so far, followed by examples of the kinds of actions you can tell
them to do:

• Strings
1. Change case
2.  Strip space off the end of a string 
3. Split a string

• Integers and decimals
1. Add and subtract
2. Simple math

• Lists
1. Add to or subtract from the list
2. Remove the last item of the list
3. Reorder the list
4. Sort the list

• Dictionaries
1. Add a key/value pair
2. Set a new value to the corresponding key
3. Look up a value by the key

###**1.2 Defining Functions**

Just like the pre-defined functions in python, we can also define our own functions in python to reduce the load of our code and avoid writing the same chunk of code again & again. 

The format for the same: 

In [None]:
def my_func(param1='default'):
    """
    Docstring goes here.
    """
    print(param1)

In [None]:
my_func()

default


In [None]:
my_func('new param')

new param


Now that we know how this works, let's define a function to return the square of a number. 

In [None]:
def square(x):
    return x**2

In [None]:
out = square(5)

In [None]:
print(out)

25


###**1.3 Exercises**

What is 7 to the power of 4?

In [None]:
#Write your code here



Split this string into a list.

In [None]:
s = "Hi there Sam!"

In [None]:
#Wtite your code here

Given the variables, use *format()* to print the following string:


In [None]:
planet = "Earth"
diameter = 12742

In [None]:
#Print



Given this nested list, use indexing to grab the word "hello"

In [None]:
lst = [1,2,[3,4],[5,[100,200,['hello']],23,11],1,7]

In [None]:
#Write your code here



Given this nest dictionary grab the word "hello". Be prepared, this will be tricky.

In [None]:
d = {'k1':[1,2,3,{'tricky':['oh','man','inception',{'target':[1,2,3,'hello']}]}]}

In [None]:
#Write your code here



Create a function that grabs the email website domain from a string in the form:

    user@domain.com
    
So for example, passing "user@domain.com" would return: domain.com

In [None]:
#Write your code here

def domainGet(email):
    return email.split('@')[-1]

In [None]:
domainGet('user@domain.com')

'domain.com'

Create a basic function that returns True if the word 'dog' is contained in the input string. Don't worry about edge cases like a punctuation being attached to the word dog, but do account for capitalization.

In [None]:
#define your function here

def findDog(st):
    return 'dog' in st.lower().split()

In [None]:
findDog('Is there a Dog here?')

True

Create a function that counts the number of times the word "dog" occurs in a string. Again ignore edge cases.

In [None]:
#define ypur function here

def countDog(st):
    count = 0
    for word in st.lower().split():
        if word == 'dog':
            count += 1
    return count

In [None]:
countDog('This dog runs faster than the other dog dude!')

2

Let's complete this module with a tricky question. Don't worry if you're not able to solve it right now, eventually you'll be able to. 

**Final Problem**

You are driving a little too fast, and a police officer stops you. Write a function
  to return one of 3 possible results: "No ticket", "Small ticket", or "Big Ticket". 
  If your speed is 60 or less, the result is "No Ticket". If speed is between 61 
  and 80 inclusive, the result is "Small Ticket". If speed is 81 or more, the result is "Big    Ticket". Unless it is your birthday (encoded as a boolean value in the parameters of the function) -- on your birthday, your speed can be 5 higher in all 
  cases.

In [None]:
#Define your function here

def caught_speeding(speed, is_birthday):
    
    if is_birthday:
        speeding = speed - 5
    else:
        speeding = speed
    
    if speeding > 80:
        return 'Big Ticket'
    elif speeding > 60:
        return 'Small Ticket'
    else:
        return 'No Ticket'

In [None]:
caught_speeding(81,True)

'Small Ticket'

In [None]:
caught_speeding(81,False)

'Big Ticket'

**Great Job!**