- Define tidy data and explain why it is an optimal format for data analysis.
- Transform data into the tidy data format using pandas.
- Demonstrate fundamental programming concepts such as loops and conditionals.
- Understand the key data structures in Python.
- Read data into Python data from vanilla (e.g., .csv) and non-standard plain text files, as well as common spreadsheet file types (e.g., .xls).
- Construct simple plots using pandas.
- Manipulate a single data table by:
7.1 Filtering rows based on a criterion or combination of criteria.
7.2 Selecting variables.
7.3 Creating new variables and modifying pre-existing ones.
7.4 Rearranging the observations or variables by sorting. - Manage and manipulate data with dates and times, missing values and categorical variables as well as renaming dataframe columns.
- Use the split-apply-combine approach to iterate over and summarize data by groups.
- Produce human-readable code that incorporates best practices of programming and coding style.
- Understanding Dataframes
- Reading in packages, libraries/modules
- Simple table manipulations (selecting) using Pandas
- Saving Dataframes as variables
- Indexing a dataframe using
.iloc[]
and.loc[]
- pandas plotting to make a scatter plot
- Obtain simple summary statistics of a dataframe
- Writing data and saving plots
- (In the assignment introduce Jupyter notebooks)
By the end of the module, students are expected to:
- Describe the components of a Dataframe.
- Read a standard
.csv
file using Pandaspd.read_csv()
. - Load the
pandas
library into Python. - Demonstrate indexing and slicing with
.loc[]
and.iloc[]
. - Demonstrate Selecting columns of a dataframe using
df[]
notation. - Obtain values from a dataframe using
.loc[]
. - Sort a dataframe using
.sort_values()
. - Create simple summary statistics using
.describe()
. - Construct simple visualizations using Altair.
- Create a
.csv
file from a dataframe using.to_csv()
.
- Read different files using Pandas
pd.read_csv()
and other functions - Simple dataframe manipulations and operations
- filtering using
df[]
- Chaining
df.grouby()
anddf.agg()
- Modify values in a dataframe using
df.apply()
anddf.applymap()
By the end of the module, students are expected to:
- Demonstrate how to rename columns of a dataframe using
.rename()
. - Create new or columns in a dataframe using
.assign()
notation. - Drop columns in a dataframe using
.drop()
- Use
df[]
notation to filter rows of a dataframe. - Calculate summary statistics on grouped objects using
.groupby()
and.agg()
. - Explain when chaining is appropriate.
- Demonstrate chaining over multiple lines and verbs.
- Tidy data - what is it?
- Manipulating data using
df.melt()
anddf.pivot()
- Dataframe stacking and unstacking
- Combining dataframes
By the end of the module, students are expected to:
- Explain what tidy data is.
- Use
.melt()
and.pivot()
to reshape dataframes, specifically to make tidy data. - Learn how to reset a dataframe's index.
- Combine dataframes using
.merge()
and.concat()
and know when to use these different methods. - Understand the different joining methods.
- Basic datatypes - within a dataframe?
- Lists and tuples
- String methods
- Dictionaries how to convert them to a dataframe?
By the end of the module, students are expected to:
- Compare and contrast Python's key data types.
- Compare and contrast Python's key data structures.
- Use Python to determine the type and structure of an object.
- Demonstrate how to create data structures and convert them to another.
- Identify which operations can be applied to different data types and columns dtypes.
- Dry
- Loops
- Loops to read in data
- Nested loops
- Conditions in loops
- Intro to functions
By the end of the module, students are expected to:
- Explain the DRY principle and how it can be useful.
- Write conditional statements with
if
,elif
andelse
to run different code, depending on the input. - Write
for
loops to repeatedly run code. - Describe the expected outcome of code with nested loops.
- Define and use a function that accepts parameters and returns values.
- Functions example in plotting, add data to a dataframe?
- Keyword arguments (default)
- Docstrings
- Unit testing
By the end of the module, students are expected to:
- Evaluate the readability, complexity and performance of a function.
- Write docstrings for functions following the NumPy/SciPy format.
- Write comments within a function to improve readability.
- Write and design functions with default arguments.
- Explain the importance of scoping and environments in Python as they relate to functions.
- Formulate test cases to prove a function design specification.
- Use
assert
statements to formulate a test case to prove a function design specification. - Use test-driven development principles to define a function that accepts parameters, returns values and passes all tests.
- Handle errors gracefully via exception handling.
- Importing your created functions from a different file
pytest
- Style guides and coding style - black
- Python debugger (PDB) (video in notebook instead with MC question)
By the end of the module, students are expected to:
- Describe what Python libraries are, as well as explain when and why they are useful.
- Identify where code can be improved concerning variable names, magic numbers, comments and whitespace.
- Write code that is human readable and follows the black style guide.
- Import files from other directories.
- Use
pytest
to check a function's tests. - When running
pytest
, explain how pytest finds the associated test functions. - Explain how the Python debugger can help rectify your code.
- (Perhaps - - NumPy arrays, pandas relationship explained from old Module 6)
- Working with DateTime format
- Working with strings in dataframes?
- Identifying and handling missing/erroneous values
- Pandas profiling
By the end of the module, students are expected to:
- Use NumPy to create ndarrays with
np.array()
and from functions such asnp.arrange()
,np.linspace()
andnp.ones()
. - Describe the shape, dimension and size of an array.
- Identify null values in a dataframe and manage them by removing them using
.dropna()
or replacing them using.fillna()
. - Manipulate non-standard date/time formats into standard Pandas datetime using
pd.to_datetime()
. - Find, and replace text from a dataframe using verbs such as
.replace()
and.contains()
.