Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extra columns in pandas dataframe not supported #758

Closed
HuifangYeo opened this issue Feb 23, 2023 · 2 comments
Closed

Extra columns in pandas dataframe not supported #758

HuifangYeo opened this issue Feb 23, 2023 · 2 comments
Labels
🐛 bug unexpected or wrong behavior
Milestone

Comments

@HuifangYeo
Copy link
Contributor

HuifangYeo commented Feb 23, 2023

Steps to reproduce

The notebook was working previously on 0.7.2.

rt_chunks_df = pd.read_csv(
    "https://data.atoti.io/notebooks/intraday-liquidity/cashflow_realtime_20211015.csv",
    chunksize=500,
    parse_dates=["Transaction_Date", "Settlement_Date"],
)

for chunk in rt_chunks_df:
    t = chunk.reset_index()
    payment_tbl.load_pandas(t)
    time.sleep(1)
print("End")

The above code threw out the below exception when trying to load the data with payment_tbl.load_pandas(t):

AtotiJavaException: class org.apache.arrow.vector.util.Text cannot be cast to class java.lang.String (org.apache.arrow.vector.util.Text is in unnamed module of loader org.springframework.boot.loader.LaunchedURLClassLoader @3532ec19; java.lang.String is in module java.base of loader 'bootstrap')

I have tried to read the data directly into a pandas dataframe instead of chunks and I have been able to load the data successfully without issue.

main.zip

Environment

  • atoti: 0.7.3
  • Python: 3.9.13
  • Operating system: win32

Logs (if relevant)

server.log

2023-02-23 09:57:52.311  WARN 38788 --- [Thread-1] i.InstanceMetadataServiceResourceFetcher : Fail to retrieve token 

com.amazonaws.SdkClientException: Failed to connect to service endpoint:
at com.amazonaws.internal.EC2ResourceFetcher.doReadResource(EC2ResourceFetcher.java:100) ~[atoti-aws.jar:na]
at com.amazonaws.internal.InstanceMetadataServiceResourceFetcher.getToken(InstanceMetadataServiceResourceFetcher.java:91) ~[atoti-aws.jar:na]
at com.amazonaws.internal.InstanceMetadataServiceResourceFetcher.readResource(InstanceMetadataServiceResourceFetcher.java:69) ~[atoti-aws.jar:na]
at com.amazonaws.internal.EC2ResourceFetcher.readResource(EC2ResourceFetcher.java:66) ~[atoti-aws.jar:na]
at com.amazonaws.auth.InstanceMetadataServiceCredentialsFetcher.getCredentialsEndpoint(InstanceMetadataServiceCredentialsFetcher.java:60) ~[atoti-aws.jar:na]
at com.amazonaws.auth.InstanceMetadataServiceCredentialsFetcher.getCredentialsResponse(InstanceMetadataServiceCredentialsFetcher.java:48) ~[atoti-aws.jar:na]
at com.amazonaws.auth.BaseCredentialsFetcher.fetchCredentials(BaseCredentialsFetcher.java:124) ~[atoti-aws.jar:na]
at com.amazonaws.auth.BaseCredentialsFetcher.getCredentials(BaseCredentialsFetcher.java:80) ~[atoti-aws.jar:na]
at com.amazonaws.auth.InstanceProfileCredentialsProvider.getCredentials(InstanceProfileCredentialsProvider.java:166) ~[atoti-aws.jar:na]
at com.amazonaws.auth.EC2ContainerCredentialsProviderWrapper.getCredentials(EC2ContainerCredentialsProviderWrapper.java:75) ~[atoti-aws.jar:na]
at com.amazonaws.auth.AWSCredentialsProviderChain.getCredentials(AWSCredentialsProviderChain.java:117) ~[atoti-aws.jar:na]
at io.atoti.loading.s3.impl.S3CredentialsProviderChain.getCredentials(S3CredentialsProviderChain.java:29) ~[atoti-aws.jar:na]
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.getCredentialsFromContext(AmazonHttpClient.java:1269) ~[atoti-aws.jar:na]
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.runBeforeRequestHandlers(AmazonHttpClient.java:845) ~[atoti-aws.jar:na]
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:794) ~[atoti-aws.jar:na]
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:781) ~[atoti-aws.jar:na]
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:755) ~[atoti-aws.jar:na]
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:715) ~[atoti-aws.jar:na]
at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:697) ~[atoti-aws.jar:na]
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:561) ~[atoti-aws.jar:na]
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:541) ~[atoti-aws.jar:na]
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5456) ~[atoti-aws.jar:na]
at com.amazonaws.services.s3.AmazonS3Client.getBucketRegionViaHeadRequest(AmazonS3Client.java:6431) ~[atoti-aws.jar:na]
at com.amazonaws.services.s3.AmazonS3Client.fetchRegionFromCache(AmazonS3Client.java:6404) ~[atoti-aws.jar:na]
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5441) ~[atoti-aws.jar:na]
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5403) ~[atoti-aws.jar:na]
at com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:1372) ~[atoti-aws.jar:na]
at com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:1346) ~[atoti-aws.jar:na]
at com.amazonaws.services.s3.AmazonS3Client.doesObjectExist(AmazonS3Client.java:1427) ~[atoti-aws.jar:na]
at io.atoti.loading.s3.impl.S3Path.exist(S3Path.java:244) ~[atoti-aws.jar:na]
at io.atoti.loading.impl.AFileBasedDataTable.(AFileBasedDataTable.java:28) ~[patachou-core-6.0.3-20230222-192541-08c2aaa0.jar!/:na]
at io.atoti.loading.parquet.impl.ParquetDataTable.(ParquetDataTable.java:45) ~[patachou-core-6.0.3-20230222-192541-08c2aaa0.jar!/:na]
at io.atoti.loading.parquet.impl.ParquetDataTableFactory.createTable(ParquetDataTableFactory.java:31) ~[patachou-core-6.0.3-20230222-192541-08c2aaa0.jar!/:na]
at io.atoti.loading.parquet.impl.ParquetDataTableFactory.createTable(ParquetDataTableFactory.java:10) ~[patachou-core-6.0.3-20230222-192541-08c2aaa0.jar!/:na]
at io.atoti.api.impl.OutsideTransactionDataApiImpl.inferTypesFromDataSource(OutsideTransactionDataApiImpl.java:105) ~[patachou-core-6.0.3-20230222-192541-08c2aaa0.jar!/:na]
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:na]
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) ~[na:na]
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) ~[na:na]
at java.base/java.lang.reflect.Method.invoke(Unknown Source) ~[na:na]
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) ~[py4j-0.10.9.jar!/:na]
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) ~[py4j-0.10.9.jar!/:na]
at py4j.Gateway.invoke(Gateway.java:282) ~[py4j-0.10.9.jar!/:na]
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) ~[py4j-0.10.9.jar!/:na]
at py4j.commands.CallCommand.execute(CallCommand.java:79) ~[py4j-0.10.9.jar!/:na]
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182) ~[py4j-0.10.9.jar!/:na]
at py4j.ClientServerConnection.run(ClientServerConnection.java:106) ~[py4j-0.10.9.jar!/:na]
at java.base/java.lang.Thread.run(Unknown Source) ~[na:na]
Caused by: java.net.SocketException: Network is unreachable: no further information
at java.base/sun.nio.ch.Net.pollConnect(Native Method) ~[na:na]
at java.base/sun.nio.ch.Net.pollConnectNow(Unknown Source) ~[na:na]
at java.base/sun.nio.ch.NioSocketImpl.timedFinishConnect(Unknown Source) ~[na:na]
at java.base/sun.nio.ch.NioSocketImpl.connect(Unknown Source) ~[na:na]
at java.base/java.net.Socket.connect(Unknown Source) ~[na:na]
at java.base/sun.net.NetworkClient.doConnect(Unknown Source) ~[na:na]
at java.base/sun.net.www.http.HttpClient.openServer(Unknown Source) ~[na:na]
at java.base/sun.net.www.http.HttpClient.openServer(Unknown Source) ~[na:na]
at java.base/sun.net.www.http.HttpClient.(Unknown Source) ~[na:na]
at java.base/sun.net.www.http.HttpClient.New(Unknown Source) ~[na:na]
at java.base/sun.net.www.http.HttpClient.New(Unknown Source) ~[na:na]
at java.base/sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(Unknown Source) ~[na:na]
at java.base/sun.net.www.protocol.http.HttpURLConnection.plainConnect0(Unknown Source) ~[na:na]
at java.base/sun.net.www.protocol.http.HttpURLConnection.plainConnect(Unknown Source) ~[na:na]
at java.base/sun.net.www.protocol.http.HttpURLConnection.connect(Unknown Source) ~[na:na]
at com.amazonaws.internal.ConnectionUtils.connectToEndpoint(ConnectionUtils.java:95) ~[atoti-aws.jar:na]
at com.amazonaws.internal.EC2ResourceFetcher.doReadResource(EC2ResourceFetcher.java:80) ~[atoti-aws.jar:na]
... 46 common frames omitted

@HuifangYeo HuifangYeo added the 🐛 bug unexpected or wrong behavior label Feb 23, 2023
@HuifangYeo
Copy link
Contributor Author

It turns out that because the pandas DataFrame has extra columns - index, therefore I wasn't able to load the DataFrame into the table. After I select precisely the atot table column from the dataframe, then loading works successfully.

Is this the expected behaviour?

HuifangYeo added a commit to atoti/notebooks that referenced this issue Feb 23, 2023
@tibdex tibdex changed the title class org.apache.arrow.vector.util.Text cannot be cast to class java.lang.String Extra columns in pandas dataframe not supported Feb 23, 2023
@tibdex
Copy link
Member

tibdex commented Feb 23, 2023

I can reproduce.

In 0.7.2 this works:

import pandas as pd
import atoti as tt

session = tt.Session()

table = session.read_pandas(
    pd.DataFrame({"int": [1], "string": ["A"]}),
    table_name="Default",
)
table.load_pandas(pd.DataFrame({"float": [1.5], "int": [2], "string": ["B"]}))

but in 0.7.3 it fails with:

atoti._exceptions.AtotiJavaException: An error occurred while calling o42.loadDataSourceIntoStore.
E           : java.lang.IndexOutOfBoundsException: Index: 2 Size: 2
E           	at java.base/java.util.ImmutableCollections$AbstractImmutableList.outOfBounds(Unknown Source)
E           	at java.base/java.util.ImmutableCollections$List12.get(Unknown Source)
E           	at io.atoti.loading.arrow.ArrowParser.parseColumn(ArrowParser.java:75)
E           	at io.atoti.loading.arrow.ArrowParser.parse(ArrowParser.java:64)
E           	at io.atoti.loading.arrow.ArrowDataTable.loadWithinTransaction(ArrowDataTable.java:51)
E           	at io.atoti.loading.impl.DataTableOperationUnit.executeOperation(DataTableOperationUnit.java:18)
E           	at io.atoti.api.impl.DataApiImpl.executeOperation(DataApiImpl.java:100)
E           	at io.atoti.api.impl.DataApiImpl.loadDataSourceIntoStore(DataApiImpl.java:121)
E           	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
E           	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
E           	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
E           	at java.base/java.lang.reflect.Method.invoke(Unknown Source)
E           	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
E           	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
E           	at py4j.Gateway.invoke(Gateway.java:282)
E           	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
E           	at py4j.commands.CallCommand.execute(CallCommand.java:79)
E           	at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
E           	at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
E           	at java.base/java.lang.Thread.run(Unknown Source)

When the extra column is at the end:

- table.load_pandas(pd.DataFrame({"float": [1.5], "int": [2], "string": ["B"]}))
+ table.load_pandas(pd.DataFrame({"int": [2], "string": ["B"], "float": [1.5]}))

it works in 0.7.3 too.

This is tracked in an internal ticket.


In your specific case, a fix is to do:

  for chunk in rt_chunks_df:
-    t = chunk.reset_index()
     payment_tbl.load_pandas(t)

since chunk does not have an index to begin with.

@tibdex tibdex added the ⏳ waiting dependency internal dependency development required label Feb 23, 2023
printhellohetal pushed a commit to atoti/notebooks that referenced this issue Feb 24, 2023
* Update table joins to conditional join

* Update real-time risk on Bitnami kafka repo url, quantlib calendar for USA and Atoti's mapping to condition

* Update customer 360 use case from Atoti mapping to condition

* Update SA-CCR to switch join mapping to conditions

* Update XVA to switch join mapping to conditions

* Update join mapping to condition for CCF

* Fixed tsfresh and protobuf conflict issue for collateral shortfall forecast notebook. Update mapping to condition

* Update mapping to condition for collateral shortfall monitoring

* Update mapping to condition for intraday liquidity

* Update mappings to condition for airline industry use case

* Update mappings to conditions for baseball notebook

* Update mapping to conditions for ca-solar

* Update mapping to conditions for digital marketing

* remove mapping for drug efficacy

* Update mapping to conditions for food processing notebook

* Update mapping to conditions for F1

* update mapping to condition for election nb

* Update mapping to condition for global covid nb

* Update object detection to download data and model, mapping to conditions.

* Update mappings to conditions for pokemon nb

* Update mapping to conditions for pricing simulation nb

* Update mapping from mapping to conditions for sales-commission nb

* Update mapping to conditions for twitter nb

* Update mapping to condition for conditional-function nb

* Update mapping to conditions for curr-cov-weighted-avg nb

* Update mapping to condition for curr-coversion nb

* Update mappings to conditions for introductory-tutorial nb

* Update mappings to conditions for rollup-hierarchies nb

* Reorg pandas chart

* Update `user.roles` to user_service_client.individual_roles instead

* Formatting and update cell metadata for notebooks

* Upgrade Atoti CE to v0.7.3

* Workaround for atoti/atoti#758

* Format notebook
@tibdex tibdex removed the ⏳ waiting dependency internal dependency development required label Jul 28, 2023
@tibdex tibdex added this to the Next release milestone Jul 28, 2023
@patachoux patachoux bot closed this as completed Aug 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐛 bug unexpected or wrong behavior
Projects
None yet
Development

No branches or pull requests

2 participants