Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can the installation of pandas/numpy be optimized/made optional #194

Open
windrad6 opened this issue Jun 8, 2023 · 11 comments
Open

Can the installation of pandas/numpy be optimized/made optional #194

windrad6 opened this issue Jun 8, 2023 · 11 comments
Assignees
Labels
feature request Request a potential feature question Further information is requested

Comments

@windrad6
Copy link
Contributor

windrad6 commented Jun 8, 2023

After trying to install FiLiP on a Raspberry PI I ran into issues with pandas/numpy. I checked the code and pandas is mainly used in examples, tests and tutorials. For the actual library, it is only used in models/ngsi_v2/units.py for load_units().

Can we make it optional?

It is one of the main drivers for time consumption during setup and a source of incompatibilities on different operating systems.

@windrad6 windrad6 added the bug Something isn't working label Jun 8, 2023
@djs0109
Copy link
Contributor

djs0109 commented Jun 12, 2023

Hi, I see the pandas and numpy are also being used in timeseries.py. Do you have an idea how to change it in a clean way? Otherwise, a local branch or a fork would be the quick and dirty solution

@Maghnie Maghnie self-assigned this Jun 14, 2023
@github-actions
Copy link

@Maghnie
Copy link
Contributor

Maghnie commented Jun 22, 2023

@windrad6 Could you give a bit more info about the Raspberry PI OS and which commands exactly are you using to install filip?

@Maghnie Maghnie added the question Further information is requested label Jun 22, 2023
@tstorek
Copy link
Collaborator

tstorek commented Jun 23, 2023

@windrad6 The TimeSeriesApi strongly relies on pandas as it is designed to parse the data to a format that users can directly trough into theier analysis algorithms. Making pandas optional would have two consequences:

  1. TimeSeries Objects are not convertible to Pandas Dataframes directly. Furthermore, it most likely requires restructuring the code because the imports would be missing in the TimeSeries Models. An inline import in the pandas correlated functions could fix this.
  2. The Code for unit validation would become far more complicated without pandas and require a lot of refactoring to convert it to a basic dict-based code. This should convert the Units to dicts that has either the "Name" or "Code" as keys(). The csv-files in the data folder may prohibit the automatic download via "load_datapackage". Hence, again this would require an inline import in the "load_datapackage" function.
    I think that is already partially implemented in the get_keys() function ;)

The installation of numpy/pandas on raspberry pi is a known issue and it is a pitty that there is no real way to fix that without waiting for hours. Nevertheless, it is probably not the worst idea to reduce dependencies of the library if not required :)

@Maghnie
Copy link
Contributor

Maghnie commented Jul 24, 2023

@tstorek Would removing the dependency on pandas become less of a hassle if filip had a separate "lite" installation setup?

Most of the pandas usages are in the examples and tests, which the regular filip user doesn't need.

We would "just" have to figure out how to deal with the usages in Units.

@tstorek
Copy link
Collaborator

tstorek commented Jul 24, 2023

@Maghnie this is incorrect. Examples and tests are independent from the actual package installation. Removing pandas would partly break the timeseries models as already pointed out. Nevertheless, it is probably worth thinking of making unit validation not as strict as it is implemented right now or as an extension package. The current implementation may change with the migration to pydantic v2 (#199) anyway.

The decision on this is up to the core-developer-team. (@djs0109)

@djs0109
Copy link
Contributor

djs0109 commented Jul 24, 2023

@Maghnie I find your idea of creating a "lite" version good. Do you already have an idea of what functionalities are not required? For example, Unit, TimeSeries API, and so on. I am not sure if a "lite" version can be made within the same repo, or should you make a fork? But as for now, I can only promise support in reviewing because our funded projects have not yet encountered such a requirement.

@djs0109 djs0109 added the feature request Request a potential feature label Jul 24, 2023
@Maghnie
Copy link
Contributor

Maghnie commented Jul 24, 2023

this is incorrect.

What is correct then is the practice of making controversial statements to insure a speedy reply, which is confirmed here. :)

Removing pandas would partly break the timeseries models as already pointed out.

I'm not closely familiar with the code but after skimming it, I'm not sure how removing pandas breaks the TimeSeries models, since its to_pandas method seems to be used only in tests.
image

@Maghnie
Copy link
Contributor

Maghnie commented Jul 24, 2023

Do you already have an idea of what functionalities are not required? For example, Unit, TimeSeries API, and so on.

Yes, I would start there too. We could later look into other bulky packages, if needed.

I am not sure if a "lite" version can be made within the same repo, or should you make a fork?

I imagine it can be done in the same repo, but just with different installation instructions.

But as for now, I can only promise support in reviewing because our funded projects have not yet encountered such a requirement.

Sure, I understand. Filip works in I-GReta too, it just takes a lot of time (aka money) for the installation.

@tstorek
Copy link
Collaborator

tstorek commented Jul 24, 2023

@Maghnie Let me clarify. to_pandas is a functionality provided by filip to the user. This would not be possible without pandas. Hence, removing pandas would break this functionality of the model. But as pointed it out this would need some rafactoring to make it optional.

Examples only use it for demonstration reasons and Tests insure that it works as expected. Usually, when using installation via pip the user does not even regognize their existence.

The Unit model is more complex and would require a little more refactoring because it uses pandas to manage all unit-information. Like a tiny database. Furthermore, the attribute models automatically enforce unit validation and auto-completion if some special keywords are used. Especially, on edge-device level this prohibts falsy unit usage. But apparently to the mentioned use-case it comes with some drawbacks that I was not aware of when I implemented that feature.

Nevertheless, personally, I agree with you that making both functionalities optinal is probably a good idea. But it seems easier as you might expect. That is what I tried to say. Nevertheless, splitting the library into several parts will most likely favor its usabilty in different scenarios.

BTW: I suggest to have a look into optional imports and extra dependencies before starting an additonal repo:

try:
   import pandas as pd
   has_pandas = True
ecxept ImportError():
  has_pandas = False


if has_pandas:
  do something with pandas....
else:
  raise ImportError() 

Furthermore, you need to adjust the setup.py and add extras. Have a look here: https://github.com/RWTH-EBC/ebcpy/blob/master/setup.py

You can find further exmaples online.

@djs0109 djs0109 removed the bug Something isn't working label Jan 15, 2024
@Maghnie
Copy link
Contributor

Maghnie commented Jul 10, 2024

@windrad6 As a workaround for the time-consuming setup, could you try installing Filip with uv?

uv seems to be more efficient than "normal pip" as an installer and package resolver.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Request a potential feature question Further information is requested
Projects
None yet
Development

When branches are created from issues, their pull requests are automatically linked.

4 participants