Chapter 14: Integrating Suggestions, Restructure Sections, and Update Exercise by HumphreyYang · Pull Request #201 · QuantEcon/lecture-python-programming

HumphreyYang · 2022-08-09T01:29:26Z

In this PR on pandas lecture, I have performed the following tasks:

Integrated suggestions on Pandas from Thomas on finding data using conditions and changing values in dataframe;
Remove redundancy in content;
Restructure the DataFrame section into subsections by topics;
Adding the application section to briefly discuss where these techniques are applied;
Remove Sony from the exercise because its data is missing;
Address Shift exercise solutions to immediately after exercises #192 in Chapter 14;
Adding explanations to code.

During the revision, I found that it is possible to further expand our discussion on pandas. For example, I felt that content on data wrangling and data analysis based on pandas dataframe can be further discussed in a separate chapter in future iterations.

Could you please kindly review these changes and provide some feedback on this version?

This reverts commit a9648a5.

jstac · 2022-08-09T21:29:54Z

-A `DataFrame` is an object for storing related columns of data.

-Let's start with Series
+We begin with creating a series with four random observations


We begin by creating a series with

jstac · 2022-08-09T21:30:37Z

+### Select Data by Position
+
+In practice, one thing that we do all the time with a dataframe is we want to find, select and work with a subset of the data of our interests. 
+


Please cut "we want to"

jstac · 2022-08-09T21:32:25Z

+
+Real world datasets can be [enoumous](https://developers.google.com/machine-learning/data-prep/construct/collect/data-size-quality).
+
+It is sometimes desirable to work with a subset of data for computational efficiency and reduce redundancy to save space.


The grammar in this sentence is not quite right. Maybe "...to enhance computational efficiency, reduce redundancy and save space"?

jstac · 2022-08-09T21:32:51Z

+
+This function can be some built-in functions like `max`, a `lambda` function, or user-defined function.
+
+Here is an example using `max` function


using the max function

jstac · 2022-08-09T21:33:28Z

+
+`df.apply()` here returns a series of boolean values rows that satisfies the condition specified in the if-else statement.
+
+In addition, it also defined a subset of variables of interest.


jstac · 2022-08-09T21:34:33Z

+df
+```
+
+3. We can use `.apply()` method to modify by rows/columns as a whole


use the

jstac · 2022-08-09T21:34:47Z

+df.apply(update_row, axis=1)
+```
+
+4. We can use `.applymap()` method to modify all individual entries in the dataframe altogether.


use the

jstac · 2022-08-09T21:35:14Z

+df
+```
+
+`zip` function here creates pairs of values at the corresponding position of the two lists (i.e. [0,3], [3,4] ...)


Here, the zip function creates

jstac · 2022-08-09T21:35:36Z

+df.applymap(replace_nan)
+```
+
+Pandas also provides us with convinient methods to replace missing values


jstac · 2022-08-09T21:38:17Z

Nice work @HumphreyYang , many thanks. Please see the comments above. If you can, please find a way to check spelling and grammar. (One way to get both right is to right very short, simple sentences. You are doing well, please keep pushing in this direction.)

@mmcky , I'll leave you to review and merge from here. Thanks.

HumphreyYang · 2022-08-10T00:55:43Z

Hi @jstac,
Thank you so much for your detailed checks. I will be more careful in future iterations.

Hi @mmcky,
I went through the text again. I corrected typos and reduced redundancy in text. Could you please kindly review my latest version and kindly inform me if there is anything I need to change? Thank you.

mmcky · 2022-08-10T05:56:20Z

-"Uruguay","URY","2000","3219.793","12.099591667","25255.961693","78.978740282","5.108067988"
-```
+Let's look at an example that reads data from the CSV file `pandas/data/test_pwt.csv`, which is taken from the [Penn World Tables](https://www.rug.nl/ggdc/productivity/pwt/pwt-releases/pwt-7.0).



@HumphreyYang thanks for linking this to the PWT v7.0. It might be worth adding a small table with the variable names and the description to make it crystal clear. What do you think?

mmcky

thanks @HumphreyYang this is nice work. Thanks @jstac for your comments also.

I have just made one comment re: defining the variable names in the Penn world tables data sample to make that even clearer. We were confused when we talked on Zoom so it's probably a good thing to do.

Once that is added I'll send a copy to Tom for comments.

HumphreyYang · 2022-08-10T06:38:14Z

thanks @HumphreyYang this is nice work. Thanks @jstac for your comments also.

I have just made one comment re: defining the variable names in the Penn world tables data sample to make that even clearer. We were confused when we talked on Zoom so it's probably a good thing to do.

Once that is added I'll send a copy to Tom for comments.

Hi @mmcky,

Thank you for your comment. I have added the table to the latest commit. I omitted details such as "at current price" and "G-K method" since it is only for readers to have an idea about what these variables are. Do you think I should put these details in?

Thank you.

mmcky · 2022-08-10T06:45:43Z

Just the meaning of the variable is the right call. Thanks @HumphreyYang

HumphreyYang · 2022-08-10T06:48:25Z

Just the meaning of the variable is the right call. Thanks @HumphreyYang

Thanks @mmcky. Could you please kindly review the latest deployment and inform me if there is anything I need to mention or change?

HumphreyYang added 13 commits August 7, 2022 12:17

Cleaning up sections

c35f153

cleaning up sections

c29c915

Conditioning

3296273

fix typos

199be73

Apply method

80fb676

Update Manipulating DataFrame Section

4f30375

Update Subsetting Dataframe

a9648a5

Revert "Update Subsetting Dataframe"

27fc40e

This reverts commit a9648a5.

bug fix

4f461ed

typo fix

9213123

Remove Sony

cfd5e78

change titles and fix typos

364067d

change titles

0a7aea5

jstac reviewed Aug 9, 2022

View reviewed changes

fix typos

4f764d8

github-actions Bot temporarily deployed to commit August 9, 2022 23:59 Inactive

fix typos

8da6e85

github-actions Bot temporarily deployed to commit August 10, 2022 00:28 Inactive

reduce repetitions in text

2012007

github-actions Bot temporarily deployed to commit August 10, 2022 00:54 Inactive

mmcky reviewed Aug 10, 2022

View reviewed changes

HumphreyYang added 2 commits August 10, 2022 16:28

Add table for variables

74afc0f

change units

91bdb83

github-actions Bot temporarily deployed to commit August 10, 2022 06:35 Inactive

github-actions Bot temporarily deployed to commit August 10, 2022 06:43 Inactive

mmcky merged commit 09de43a into main Aug 11, 2022

mmcky deleted the lec-pandas-integration branch August 11, 2022 01:15

		### Select Data by Position

		In practice, one thing that we do all the time with a dataframe is we want to find, select and work with a subset of the data of our interests.


		Real world datasets can be [enoumous](https://developers.google.com/machine-learning/data-prep/construct/collect/data-size-quality).

		It is sometimes desirable to work with a subset of data for computational efficiency and reduce redundancy to save space.


		This function can be some built-in functions like `max`, a `lambda` function, or user-defined function.

		Here is an example using `max` function


		`df.apply()` here returns a series of boolean values rows that satisfies the condition specified in the if-else statement.

		In addition, it also defined a subset of variables of interest.

Uh oh!

Conversation

HumphreyYang commented Aug 9, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jstac commented Aug 9, 2022

Uh oh!

HumphreyYang commented Aug 10, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mmcky left a comment

Choose a reason for hiding this comment

Uh oh!

HumphreyYang commented Aug 10, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mmcky commented Aug 10, 2022

Uh oh!

HumphreyYang commented Aug 10, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

HumphreyYang commented Aug 9, 2022 •

edited

Loading

HumphreyYang commented Aug 10, 2022 •

edited

Loading

HumphreyYang commented Aug 10, 2022 •

edited

Loading

HumphreyYang commented Aug 10, 2022 •

edited

Loading