Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

simplified how RUL is calculated using transform method with GroupBy #8

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

kylejones200
Copy link

@kylejones200 kylejones200 commented May 27, 2020

Issue #, if available:

Description of changes: Original version uses complicated approach to find the max number of cycles for each id. Using pd.DataFrame.transform with pd.Groupby, we can find the max value for each id and assign it to the the proper column. This prevents making extra copies of the DataFrame and then merging those slices.

Original:

for i, df in enumerate(train_df):
    rul = pd.DataFrame(df.groupby('id')['cycle'].max()).reset_index()
    rul.columns = ['id', 'max']
    df = df.merge(rul, on=['id'], how='left')
    df['RUL'] = df['max'] - df['cycle']
    df.drop('max', axis=1, inplace=True)
    train_df[i]=df

revised:

df['max'] = df.groupby(['id'])['cycle'].transform(max)
df['RUL'] = df['max'] - df['cycle']

This code could be further simplified by using the "names" argument to assign the labels to the columns. I didn't make this change because the way the columns list is used for the test datasets causes issues. However, the process for reading in the data for the test data is also needlessly complex.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant