Skip to content

Chapter 14: Integrating Suggestions, Restructure Sections, and Update Exercise#201

Merged
mmcky merged 18 commits intomainfrom
lec-pandas-integration
Aug 11, 2022
Merged

Chapter 14: Integrating Suggestions, Restructure Sections, and Update Exercise#201
mmcky merged 18 commits intomainfrom
lec-pandas-integration

Conversation

@HumphreyYang
Copy link
Copy Markdown
Member

@HumphreyYang HumphreyYang commented Aug 9, 2022

Hi @jstac and @mmcky ,

In this PR on pandas lecture, I have performed the following tasks:

  1. Integrated suggestions on Pandas from Thomas on finding data using conditions and changing values in dataframe;
  2. Remove redundancy in content;
  3. Restructure the DataFrame section into subsections by topics;
  4. Adding the application section to briefly discuss where these techniques are applied;
  5. Remove Sony from the exercise because its data is missing;
  6. Address Shift exercise solutions to immediately after exercises #192 in Chapter 14;
  7. Adding explanations to code.

During the revision, I found that it is possible to further expand our discussion on pandas. For example, I felt that content on data wrangling and data analysis based on pandas dataframe can be further discussed in a separate chapter in future iterations.

Could you please kindly review these changes and provide some feedback on this version?

Comment thread lectures/pandas.md Outdated
A `DataFrame` is an object for storing related columns of data.

Let's start with Series
We begin with creating a series with four random observations
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We begin by creating a series with

Comment thread lectures/pandas.md
### Select Data by Position

In practice, one thing that we do all the time with a dataframe is we want to find, select and work with a subset of the data of our interests.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please cut "we want to"

Comment thread lectures/pandas.md Outdated

Real world datasets can be [enoumous](https://developers.google.com/machine-learning/data-prep/construct/collect/data-size-quality).

It is sometimes desirable to work with a subset of data for computational efficiency and reduce redundancy to save space.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The grammar in this sentence is not quite right. Maybe "...to enhance computational efficiency, reduce redundancy and save space"?

Comment thread lectures/pandas.md Outdated

This function can be some built-in functions like `max`, a `lambda` function, or user-defined function.

Here is an example using `max` function
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using the max function

Comment thread lectures/pandas.md Outdated

`df.apply()` here returns a series of boolean values rows that satisfies the condition specified in the if-else statement.

In addition, it also defined a subset of variables of interest.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

defines ?

Comment thread lectures/pandas.md Outdated
df
```

3. We can use `.apply()` method to modify by rows/columns as a whole
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use the

Comment thread lectures/pandas.md Outdated
df.apply(update_row, axis=1)
```

4. We can use `.applymap()` method to modify all individual entries in the dataframe altogether.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use the

Comment thread lectures/pandas.md Outdated
df
```

`zip` function here creates pairs of values at the corresponding position of the two lists (i.e. [0,3], [3,4] ...)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, the zip function creates

Comment thread lectures/pandas.md Outdated
df.applymap(replace_nan)
```

Pandas also provides us with convinient methods to replace missing values
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

convenient

@jstac
Copy link
Copy Markdown
Contributor

jstac commented Aug 9, 2022

Nice work @HumphreyYang , many thanks. Please see the comments above. If you can, please find a way to check spelling and grammar. (One way to get both right is to right very short, simple sentences. You are doing well, please keep pushing in this direction.)

@mmcky , I'll leave you to review and merge from here. Thanks.

@github-actions github-actions Bot temporarily deployed to commit August 9, 2022 23:59 Inactive
@github-actions github-actions Bot temporarily deployed to commit August 10, 2022 00:28 Inactive
@github-actions github-actions Bot temporarily deployed to commit August 10, 2022 00:54 Inactive
@HumphreyYang
Copy link
Copy Markdown
Member Author

HumphreyYang commented Aug 10, 2022

Hi @jstac,
Thank you so much for your detailed checks. I will be more careful in future iterations.

Hi @mmcky,
I went through the text again. I corrected typos and reduced redundancy in text. Could you please kindly review my latest version and kindly inform me if there is anything I need to change? Thank you.

Comment thread lectures/pandas.md
"Uruguay","URY","2000","3219.793","12.099591667","25255.961693","78.978740282","5.108067988"
```
Let's look at an example that reads data from the CSV file `pandas/data/test_pwt.csv`, which is taken from the [Penn World Tables](https://www.rug.nl/ggdc/productivity/pwt/pwt-releases/pwt-7.0).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HumphreyYang thanks for linking this to the PWT v7.0. It might be worth adding a small table with the variable names and the description to make it crystal clear. What do you think?

Copy link
Copy Markdown
Contributor

@mmcky mmcky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @HumphreyYang this is nice work. Thanks @jstac for your comments also.

I have just made one comment re: defining the variable names in the Penn world tables data sample to make that even clearer. We were confused when we talked on Zoom so it's probably a good thing to do.

Once that is added I'll send a copy to Tom for comments.

@github-actions github-actions Bot temporarily deployed to commit August 10, 2022 06:35 Inactive
@HumphreyYang
Copy link
Copy Markdown
Member Author

HumphreyYang commented Aug 10, 2022

thanks @HumphreyYang this is nice work. Thanks @jstac for your comments also.

I have just made one comment re: defining the variable names in the Penn world tables data sample to make that even clearer. We were confused when we talked on Zoom so it's probably a good thing to do.

Once that is added I'll send a copy to Tom for comments.

Hi @mmcky,

Thank you for your comment. I have added the table to the latest commit. I omitted details such as "at current price" and "G-K method" since it is only for readers to have an idea about what these variables are. Do you think I should put these details in?

Thank you.

@github-actions github-actions Bot temporarily deployed to commit August 10, 2022 06:43 Inactive
@mmcky
Copy link
Copy Markdown
Contributor

mmcky commented Aug 10, 2022

Just the meaning of the variable is the right call. Thanks @HumphreyYang

@HumphreyYang
Copy link
Copy Markdown
Member Author

HumphreyYang commented Aug 10, 2022

Just the meaning of the variable is the right call. Thanks @HumphreyYang

Thanks @mmcky. Could you please kindly review the latest deployment and inform me if there is anything I need to mention or change?

@mmcky mmcky merged commit 09de43a into main Aug 11, 2022
@mmcky mmcky deleted the lec-pandas-integration branch August 11, 2022 01:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants